LSTM Investment Bot - mvoggel/mvoggel.github.io GitHub Wiki

Overview:

A long short-term memory (LSTM)-based bot designed to predict stock trends and automate investments by learning from historical data and patterns in stock movements. They are stored in recurrent neural networks that contain the separate sources of data I used in this model. The bot integrates with APIs like Yahoo Finance and Alpaca to fetch data and execute trades, and learns to improve over time.

Technical Write-Up:

You can access a copy of my code here!

Developed using Python with TensorFlow and Keras for deep learning. Implemented time-series forecasting techniques to predict stock prices and calculate metrics such as Moving Average (MA50) and Relative Strength Index (RSI). Evaluated model performance using mean squared error (MSE) and root mean squared error (RMSE).


investment_bot/ 
│
├── pycache/
│
├── data_loader.py    # Fetches stock data using Yahoo Finance API, stores in DataFrame
├── feature_engineering.py     # Executes features like pulling SEC data, news, and macro trends, stores in neurons
├── main.py         # Orchestrates pipeline: loads data, builds LSTM, trains, predicts stock prices, and outputs results sequentially
├── model.py          # Builds, trains, and predicts stock prices using an LSTM neural network
├── config.py             # Configuration file
├── utils.py             #  Merges and scales stock data for unified analysis across multiple sources.
├── requirements.txt          # Dependencies file
└── README.md                 # Project documentation and roadmap

Implementation:

Applying Long Short-Term Memory (LSTM) neural networks for time-series forecasting of stock prices, integrating historical data, financial indicators, and market sentiment into a structured pipeline.

1️⃣ Data Collection & Preprocessing

  • Stock Data: Retrieved via yfinance (daily close prices).

  • Feature Engineering: Placeholder for SEC filings, sentiment, and macroeconomic data.

  • Normalization: MinMax scaling maps price values to [0,1].

  • Time-Series Framing: Creates 60-day rolling windows for training.

    X_t = {P_{t-60}, P_{t-59}, ..., P_t} → Y_{t+1} = P_{t+1}

2️⃣ LSTM Model

  • Two LSTM Layers (50 units each) → Captures sequential dependencies.

  • Dense Layers → Fully connected layers map hidden states to predictions.

  • Loss Function → Mean Squared Error (MSE) minimizes deviation from true prices.

  • Optimizer → Adam optimizer with adaptive learning rate.

    model = Sequential([ LSTM(50, return_sequences=True, input_shape=(60, num_features)), LSTM(50, return_sequences=False),Dense(25), Dense(1) ]) model.compile(optimizer='adam', loss='mean_squared_error')

3️⃣ Training & Prediction

  • Train-Test Split: 80% train, 20% test.
  • Sliding Window Prediction: Rolling 60-day inputs → next-day price outputs.
  • Inverse Scaling: Converts predictions back to actual price values.

real_predictions = scaler.inverse_transform(predictions)

4️⃣ Model Evaluation & Future Enhancements

  • Metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and directional accuracy.

The MSE formula is:

$$ MSE = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 $$

where:

  • ( Y i ) = Actual stock price at time i
  • (Y^i ) = Predicted stock price at time i
  • ( n ) = Total number of predictions

The RMSE formula is just the square root of MSE, measuring error in the same units as stock prices.


Next Steps: Dropout regularization for overfitting control. Hyperparameter tuning (batch size, sequence length). Multivariate modeling (adding sentiment & macroeconomic data).

graph LR;
    A[Start] -->|Load Symbols| B[Define stock symbols]

    subgraph Data_Collection[Data Collection]
        B -->|Fetch Stock Data| C[Get stock prices from yfinance]
        B -->|Fetch Features| D[Get SEC, Sentiment, Macro data]
    end

    C -->|Merge Data| E[Merge Stock Data with Features]
    D -->|Merge Data| E

    E -->|Scale and Split Data| F[Normalize and Train-Test Split]

    %% Force Snaking to Second Row Using Reversed Links
    F --> G[Create 60-day LSTM sequences]
    G --> H[Define LSTM Layers]
    H --> I[Train LSTM for Each Stock]

    %% Snaking Back to Left Side
    I --> J[Predict Stock Prices]
    J --> K[Convert Back to Real Prices]
    K --> L[Compare Predictions vs Actual]

    %% Reverse direction explicitly
    L -.->|Loop Back| F

    %% End connection
    L --> N[End]

    style A fill:#ffcc00,stroke:#333,stroke-width:2px;
    style N fill:#ffcc00,stroke:#333,stroke-width:2px;
    style Data_Collection fill:#e3f2fd,stroke:#333,stroke-width:2px;

Results/Screenshots/Conclusions:

Model achieved consistent predictions for short-term trends. Reduced human intervention in portfolio management. Visualized performance metrics and model predictions using Matplotlib.

Output of MSE,RMSE, MAPE, and Directional Accuracy: