RL Agent Overview - risik01/stock-ai GitHub Wiki

๐Ÿ“Š RL Agent e Sistema di Trading - Panoramica

Il RL Agent (Reinforcement Learning) รจ il cervello artificiale del sistema di trading che apprende dalle esperienze di mercato per ottimizzare le strategie di investimento.


๐Ÿง  Deep Q-Network (DQN) Architecture

๐Ÿ”ฌ Modello Neural Network

class DQN(nn.Module):
    def __init__(self, input_dim, hidden_dim=256, output_dim=3):
        super(DQN, self).__init__()
        self.network = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),     # Layer 1: Input โ†’ 256
            nn.ReLU(),
            nn.Dropout(0.2),                     # Dropout 20%
            nn.Linear(hidden_dim, hidden_dim),   # Layer 2: 256 โ†’ 256  
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, output_dim)    # Output: 3 azioni
        )

๐ŸŽฏ Azioni Possibili

Azione Valore Descrizione
HOLD 0 Mantieni posizione attuale
BUY 1 Acquista / Aumenta posizione
SELL 2 Vendi / Riduci posizione

๐Ÿ“Š State Space (42 features)

Il sistema analizza 42 indicatori tecnici per ogni decisione:

๐Ÿ’น Dati di Prezzo (10 features)

  • Open, High, Low, Close, Adjusted Close
  • Volume, Volume moving average
  • Price momentum (1d, 3d, 7d)

๐Ÿ“ˆ Indicatori Tecnici (20 features)

  • SMA: 10, 20, 50 periodi
  • EMA: 12, 26 periodi
  • RSI: 14 periodi
  • MACD: Signal, Histogram, Difference
  • Bollinger Bands: Upper, Lower, %B
  • Stochastic: %K, %D
  • Williams %R
  • CCI (Commodity Channel Index)
  • ATR (Average True Range)

๐Ÿ”„ Features Portfolio (6 features)

  • Cash disponibile (normalizzato)
  • Valore posizioni correnti
  • P&L unrealized
  • Numero giorni in posizione
  • Risk exposure (%)
  • Portfolio volatility

๐Ÿ“Š Features di Mercato (6 features)

  • Market volatility (VIX proxy)
  • Trend strength
  • Volume profile
  • Support/Resistance proximity
  • Market correlation
  • Sector performance

๐ŸŽ“ Training Process

๐Ÿ‹๏ธ Algoritmo di Apprendimento

def train_step(self, batch):
    states, actions, rewards, next_states, dones = batch
    
    # Current Q-values
    current_q = self.network(states).gather(1, actions)
    
    # Target Q-values (Double DQN)
    with torch.no_grad():
        next_actions = self.network(next_states).argmax(1)
        next_q = self.target_network(next_states).gather(1, next_actions)
        target_q = rewards + (self.gamma * next_q * ~dones)
    
    # Loss calculation
    loss = F.mse_loss(current_q, target_q)
    
    # Backpropagation
    self.optimizer.zero_grad()
    loss.backward()
    self.optimizer.step()

โš™๏ธ Hyperparameters

Parametro Valore Descrizione
Learning Rate 0.001 Velocitร  apprendimento
Gamma 0.99 Discount factor
Epsilon Start 1.0 Esplorazione iniziale
Epsilon End 0.05 Esplorazione finale
Epsilon Decay 0.995 Decay rate per epsilon
Batch Size 32 Samples per training step
Memory Size 10,000 Experience replay buffer
Target Update 100 Steps per target network update

๐ŸŽฏ Reward Function

def calculate_reward(self, action, old_portfolio, new_portfolio):
    # Portfolio return
    portfolio_return = (new_portfolio - old_portfolio) / old_portfolio
    
    # Risk-adjusted return (Sharpe-like)
    volatility = self.calculate_volatility()
    risk_adjusted = portfolio_return / (volatility + 1e-8)
    
    # Action consistency bonus
    consistency_bonus = self.get_consistency_bonus(action)
    
    # Final reward
    reward = risk_adjusted * 100 + consistency_bonus
    
    # Penalty for excessive trading
    if self.trading_frequency > threshold:
        reward -= trading_penalty
        
    return reward

๐Ÿ“Š Trading Environment

๐Ÿฆ Portfolio Management

class Portfolio:
    def __init__(self, initial_cash=10000):
        self.cash = initial_cash
        self.positions = {}
        self.history = []
        self.transaction_cost = 0.001  # 0.1%

๐Ÿ“ˆ Performance Metrics

Metrica Formula Descrizione
Total Return (final_value - initial_value) / initial_value Rendimento totale
Sharpe Ratio (mean_return - risk_free) / std_return Rendimento aggiustato per rischio
Max Drawdown max(peak_to_trough_decline) Massima perdita da picco
Win Rate profitable_trades / total_trades % trades profittevoli
Profit Factor gross_profit / gross_loss Rapporto profitti/perdite

๐Ÿ“Š Risk Management

# Position sizing
max_position_size = portfolio_value * 0.2  # Max 20% per posizione
position_size = min(calculated_size, max_position_size)

# Stop-loss
if unrealized_loss > portfolio_value * 0.05:  # 5% stop-loss
    execute_sell_order()

# Portfolio heat
total_risk = sum(position_risk for position in portfolio)
if total_risk > 0.6:  # Max 60% portfolio at risk
    reduce_positions()

๐Ÿ”„ Training Pipeline

1. ๐Ÿ“š Data Collection

# Scarica dati storici multipli simboli
python data_collector.py --symbols AAPL,GOOGL,MSFT,TSLA,AMZN
# Periodo: 5 anni di dati
# Frequenza: Daily OHLCV

2. ๐Ÿ—๏ธ Environment Setup

# Crea ambiente di training
env = TradingEnv(
    data=historical_data,
    initial_cash=10000,
    transaction_cost=0.001,
    lookback_window=30
)

3. ๐ŸŽ“ Training Loop

for episode in range(num_episodes):
    state = env.reset()
    total_reward = 0
    
    for step in range(max_steps):
        # Epsilon-greedy action selection
        action = agent.get_action(state, epsilon)
        
        # Execute action
        next_state, reward, done = env.step(action)
        
        # Store experience
        agent.memory.push(state, action, reward, next_state, done)
        
        # Train agent
        if len(agent.memory) > batch_size:
            agent.train()
        
        state = next_state
        total_reward += reward
        
        if done:
            break
    
    # Update epsilon
    epsilon = max(epsilon_end, epsilon * epsilon_decay)

4. ๐Ÿ“Š Evaluation

# Test su dati out-of-sample
test_returns = evaluate_agent(agent, test_data)
sharpe_ratio = calculate_sharpe(test_returns)
max_drawdown = calculate_max_drawdown(test_returns)

๐ŸŽฏ Strategy Engine Integration

๐Ÿ”— Multi-Strategy Framework

class StrategyEngine:
    def __init__(self):
        self.strategies = {
            'rl_agent': RLAgent(),
            'technical': TechnicalStrategy(),
            'news_sentiment': NewsSentimentStrategy()
        }
        
    def get_combined_signal(self, symbol):
        signals = {}
        for name, strategy in self.strategies.items():
            signals[name] = strategy.generate_signal(symbol)
        
        # Weighted combination
        final_signal = (
            signals['rl_agent'] * 0.5 +
            signals['technical'] * 0.3 +
            signals['news_sentiment'] * 0.2
        )
        
        return final_signal

โš–๏ธ Signal Weights

Strategia Peso Uso
RL Agent 50% Decision principale
Technical Analysis 30% Conferma trend
News Sentiment 20% Context esterno

๐Ÿ“ˆ Performance Monitoring

๐Ÿ“Š Real-time Metrics

# Dashboard metrics
current_portfolio_value = portfolio.get_total_value()
daily_return = calculate_daily_return()
sharpe_ratio = calculate_rolling_sharpe(window=30)
drawdown = calculate_current_drawdown()

# Alert thresholds
if daily_return < -0.05:  # -5% daily loss
    send_alert("CRITICAL: Daily loss exceeds 5%")
    
if drawdown > 0.15:  # 15% drawdown
    send_alert("WARNING: Drawdown exceeds 15%")

๐Ÿ“ˆ Performance Charts

  • Portfolio Value: Grafico valore nel tempo
  • Daily Returns: Distribuzione rendimenti giornalieri
  • Drawdown: Curve di drawdown
  • Action Distribution: Frequenza azioni (Hold/Buy/Sell)
  • Win/Loss Ratio: Analisi trades profittevoli

๐Ÿ› ๏ธ Model Persistence

๐Ÿ’พ Saving/Loading Models

# Save trained model
torch.save({
    'model_state_dict': agent.network.state_dict(),
    'optimizer_state_dict': agent.optimizer.state_dict(),
    'training_metrics': training_history,
    'hyperparameters': config
}, 'models/rl_model.pth')

# Load model
checkpoint = torch.load('models/rl_model.pth')
agent.network.load_state_dict(checkpoint['model_state_dict'])

๐Ÿ”„ Model Versioning

  • v1.0: Base DQN implementation
  • v1.1: Added technical indicators
  • v1.2: Improved reward function
  • v1.3: Portfolio risk management
  • v2.0: Multi-strategy integration

๐Ÿงช Backtesting Framework

๐Ÿ“Š Historical Testing

# Backtest multiple periods
results = {}
for year in [2019, 2020, 2021, 2022, 2023]:
    data = get_year_data(year)
    returns = backtest_agent(agent, data)
    results[year] = {
        'total_return': calculate_total_return(returns),
        'sharpe_ratio': calculate_sharpe(returns),
        'max_drawdown': calculate_max_drawdown(returns)
    }

๐Ÿ“ˆ Benchmark Comparison

Strategia Total Return Sharpe Ratio Max Drawdown
RL Agent 15.3% 1.24 -8.2%
Buy & Hold 12.1% 0.89 -12.5%
SMA Crossover 8.7% 0.76 -15.1%
Random 2.3% 0.12 -25.3%

๐Ÿ”ฎ Future Enhancements

๐Ÿš€ Planned Features

  1. Multi-Asset Support: Trading su portfolio diversificato
  2. Ensemble Methods: Combinazione multiple reti neurali
  3. Transfer Learning: Applicare modelli pre-trained
  4. Real-time Learning: Aggiornamento continuo modello
  5. Alternative Algorithms: A3C, PPO, SAC testing

๐ŸŽฏ Research Areas

  • Feature Engineering: Nuovi indicatori tecnici
  • Reward Engineering: Funzioni reward piรน sofisticate
  • Market Regime Detection: Adattamento a condizioni mercato
  • Risk-Aware Learning: Incorporare risk constraints nel training

๐Ÿ“š Resources & Documentation


Il RL Agent rappresenta il core intelligente del sistema, combinando deep learning con esperienza di trading per decisioni ottimali.