ML Modeling - toddbadams/strawberry GitHub Wiki

Epic: ML Labeling & Modeling

  1. πŸ”΄ As a Data Scientist, I want to generate historical buy/hold/sell labels based on rule-counts so that I have ground truth for the past 20 years.

    • Acceptance: Column label with values in {buy, hold, sell} populates for each quarterly row.
  • Docs:
  • Status: πŸ”΄
  • Priority: πŸ”΄
  1. πŸ”΄ As a Data Scientist, I want to build and evaluate a LightGBM classifier on this labeled data so that I can predict labels on new quarters.

    • Acceptance: Model training script runs, outputs accuracy/precision/recall via cross-validation.
  • Docs:
  • Status: πŸ”΄
  • Priority: πŸ”΄
  1. πŸ”΄ As a Data Scientist, I want to serialize the trained model to disk so that the serving layer can load it.

    • Acceptance: model.pkl (or equivalent) saved to artifact storage.
  • Docs:
  • Status: πŸ”΄
  • Priority: πŸ”΄

Absolutely, Todd. Here's a complete set of user stories aligned with your plan: train a dividend-predicting ML model on your dev machine and deploy the inference pipeline to a Raspberry Pi. These stories cover modeling, feature engineering, training, deployment, and usageβ€”all mapped to real tasks.


πŸ“˜ Epic: Predict Future Dividend Performance for Buy-Signal Stocks


1. As a quant developer, I want to train a model on my dev machine...

πŸ”Ή ...so I can predict combined dividend yield and growth two years into the future.

  • Story 1.1: As a quant developer, I want to use 20 years of quarterly stock fundamentals, dividends, and financial ratios as features, so the model learns from long-term trends.

  • Story 1.2: As a quant developer, I want to create lagged, rolling, and delta features for each stock/quarter, so the model captures momentum and growth signals.

  • Story 1.3: As a quant developer, I want to train an XGBoost or LightGBM regression model, so I can accurately forecast dividend performance.

  • Story 1.4: As a quant developer, I want to evaluate my model using RMSE and RΒ² on a time-split validation set, so I ensure realistic forward-looking accuracy.

  • Story 1.5: As a quant developer, I want to analyze feature importance and SHAP values, so I can validate that the model aligns with financial intuition.


2. As an engineer, I want to export the trained model...

πŸ”Ή ...so I can deploy it to the Raspberry Pi for fast, lightweight inference.

  • Story 2.1: As an engineer, I want to save the trained model to a portable format (.txt for LightGBM or .pkl for XGBoost), so I can deploy it on ARM-based devices.

  • Story 2.2: As an engineer, I want to serialize my feature engineering pipeline, so the same logic runs consistently during inference.


3. As a platform engineer, I want to install the inference environment on the Raspberry Pi...

πŸ”Ή ...so I can make predictions locally with minimal resources.

  • Story 3.1: As a platform engineer, I want to install only lightweight dependencies (numpy, pandas, lightgbm), so I avoid overloading the Pi.

  • Story 3.2: As a platform engineer, I want to test loading the model and scoring a batch of feature vectors, so I can validate the setup without full retraining.


4. As a user or analyst, I want to score stocks flagged by the rules engine...

πŸ”Ή ...so I can rank buy candidates by projected dividend performance.

  • Story 4.1: As a user, I want to feed the model a list of stocks marked as "buy" by my existing rules engine, so I only evaluate relevant candidates.

  • Story 4.2: As a user, I want to get back a ranked list of stocks with predicted 2Y dividend yield + growth, so I can select the top income-generating opportunities.

  • Story 4.3: As a user, I want to set a threshold for the predicted value, so I can filter out underperformers.


5. (Optional) As a tech-savvy user, I want to expose this as a CLI or API...

πŸ”Ή ...so I can run predictions on demand or hook into other systems.

  • Story 5.1: As a user, I want to run a command-line tool on the Pi that scores a CSV of stock data and outputs predictions to a file.

  • Story 5.2: As a developer, I want to build a simple Flask API on the Pi, so I can send a ticker and get back the predicted dividend score.


πŸ“Œ Notes

  • These stories are modularβ€”if you’re agile, you can work through them in sprints (Modeling β†’ Export β†’ Inference β†’ Interface).
  • You can also tag these by role: quant, ML engineer, platform/devops, user.

Would you like me to wrap this as a Markdown user story board you can drop into Jira or GitHub Projects? Or need acceptance criteria for each one?