๐ RL Agent e Sistema di Trading - Panoramica
Il RL Agent (Reinforcement Learning) รจ il cervello artificiale del sistema di trading che apprende dalle esperienze di mercato per ottimizzare le strategie di investimento.
๐ง Deep Q-Network (DQN) Architecture
๐ฌ Modello Neural Network
class DQN(nn.Module):
def __init__(self, input_dim, hidden_dim=256, output_dim=3):
super(DQN, self).__init__()
self.network = nn.Sequential(
nn.Linear(input_dim, hidden_dim), # Layer 1: Input โ 256
nn.ReLU(),
nn.Dropout(0.2), # Dropout 20%
nn.Linear(hidden_dim, hidden_dim), # Layer 2: 256 โ 256
nn.ReLU(),
nn.Dropout(0.2),
nn.Linear(hidden_dim, output_dim) # Output: 3 azioni
)
๐ฏ Azioni Possibili
Azione |
Valore |
Descrizione |
HOLD |
0 |
Mantieni posizione attuale |
BUY |
1 |
Acquista / Aumenta posizione |
SELL |
2 |
Vendi / Riduci posizione |
๐ State Space (42 features)
Il sistema analizza 42 indicatori tecnici per ogni decisione:
๐น Dati di Prezzo (10 features)
- Open, High, Low, Close, Adjusted Close
- Volume, Volume moving average
- Price momentum (1d, 3d, 7d)
๐ Indicatori Tecnici (20 features)
- SMA: 10, 20, 50 periodi
- EMA: 12, 26 periodi
- RSI: 14 periodi
- MACD: Signal, Histogram, Difference
- Bollinger Bands: Upper, Lower, %B
- Stochastic: %K, %D
- Williams %R
- CCI (Commodity Channel Index)
- ATR (Average True Range)
๐ Features Portfolio (6 features)
- Cash disponibile (normalizzato)
- Valore posizioni correnti
- P&L unrealized
- Numero giorni in posizione
- Risk exposure (%)
- Portfolio volatility
๐ Features di Mercato (6 features)
- Market volatility (VIX proxy)
- Trend strength
- Volume profile
- Support/Resistance proximity
- Market correlation
- Sector performance
๐ Training Process
๐๏ธ Algoritmo di Apprendimento
def train_step(self, batch):
states, actions, rewards, next_states, dones = batch
# Current Q-values
current_q = self.network(states).gather(1, actions)
# Target Q-values (Double DQN)
with torch.no_grad():
next_actions = self.network(next_states).argmax(1)
next_q = self.target_network(next_states).gather(1, next_actions)
target_q = rewards + (self.gamma * next_q * ~dones)
# Loss calculation
loss = F.mse_loss(current_q, target_q)
# Backpropagation
self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()
โ๏ธ Hyperparameters
Parametro |
Valore |
Descrizione |
Learning Rate |
0.001 |
Velocitร apprendimento |
Gamma |
0.99 |
Discount factor |
Epsilon Start |
1.0 |
Esplorazione iniziale |
Epsilon End |
0.05 |
Esplorazione finale |
Epsilon Decay |
0.995 |
Decay rate per epsilon |
Batch Size |
32 |
Samples per training step |
Memory Size |
10,000 |
Experience replay buffer |
Target Update |
100 |
Steps per target network update |
๐ฏ Reward Function
def calculate_reward(self, action, old_portfolio, new_portfolio):
# Portfolio return
portfolio_return = (new_portfolio - old_portfolio) / old_portfolio
# Risk-adjusted return (Sharpe-like)
volatility = self.calculate_volatility()
risk_adjusted = portfolio_return / (volatility + 1e-8)
# Action consistency bonus
consistency_bonus = self.get_consistency_bonus(action)
# Final reward
reward = risk_adjusted * 100 + consistency_bonus
# Penalty for excessive trading
if self.trading_frequency > threshold:
reward -= trading_penalty
return reward
๐ Trading Environment
๐ฆ Portfolio Management
class Portfolio:
def __init__(self, initial_cash=10000):
self.cash = initial_cash
self.positions = {}
self.history = []
self.transaction_cost = 0.001 # 0.1%
๐ Performance Metrics
Metrica |
Formula |
Descrizione |
Total Return |
(final_value - initial_value) / initial_value |
Rendimento totale |
Sharpe Ratio |
(mean_return - risk_free) / std_return |
Rendimento aggiustato per rischio |
Max Drawdown |
max(peak_to_trough_decline) |
Massima perdita da picco |
Win Rate |
profitable_trades / total_trades |
% trades profittevoli |
Profit Factor |
gross_profit / gross_loss |
Rapporto profitti/perdite |
๐ Risk Management
# Position sizing
max_position_size = portfolio_value * 0.2 # Max 20% per posizione
position_size = min(calculated_size, max_position_size)
# Stop-loss
if unrealized_loss > portfolio_value * 0.05: # 5% stop-loss
execute_sell_order()
# Portfolio heat
total_risk = sum(position_risk for position in portfolio)
if total_risk > 0.6: # Max 60% portfolio at risk
reduce_positions()
๐ Training Pipeline
1. ๐ Data Collection
# Scarica dati storici multipli simboli
python data_collector.py --symbols AAPL,GOOGL,MSFT,TSLA,AMZN
# Periodo: 5 anni di dati
# Frequenza: Daily OHLCV
2. ๐๏ธ Environment Setup
# Crea ambiente di training
env = TradingEnv(
data=historical_data,
initial_cash=10000,
transaction_cost=0.001,
lookback_window=30
)
3. ๐ Training Loop
for episode in range(num_episodes):
state = env.reset()
total_reward = 0
for step in range(max_steps):
# Epsilon-greedy action selection
action = agent.get_action(state, epsilon)
# Execute action
next_state, reward, done = env.step(action)
# Store experience
agent.memory.push(state, action, reward, next_state, done)
# Train agent
if len(agent.memory) > batch_size:
agent.train()
state = next_state
total_reward += reward
if done:
break
# Update epsilon
epsilon = max(epsilon_end, epsilon * epsilon_decay)
4. ๐ Evaluation
# Test su dati out-of-sample
test_returns = evaluate_agent(agent, test_data)
sharpe_ratio = calculate_sharpe(test_returns)
max_drawdown = calculate_max_drawdown(test_returns)
๐ฏ Strategy Engine Integration
๐ Multi-Strategy Framework
class StrategyEngine:
def __init__(self):
self.strategies = {
'rl_agent': RLAgent(),
'technical': TechnicalStrategy(),
'news_sentiment': NewsSentimentStrategy()
}
def get_combined_signal(self, symbol):
signals = {}
for name, strategy in self.strategies.items():
signals[name] = strategy.generate_signal(symbol)
# Weighted combination
final_signal = (
signals['rl_agent'] * 0.5 +
signals['technical'] * 0.3 +
signals['news_sentiment'] * 0.2
)
return final_signal
โ๏ธ Signal Weights
Strategia |
Peso |
Uso |
RL Agent |
50% |
Decision principale |
Technical Analysis |
30% |
Conferma trend |
News Sentiment |
20% |
Context esterno |
๐ Performance Monitoring
๐ Real-time Metrics
# Dashboard metrics
current_portfolio_value = portfolio.get_total_value()
daily_return = calculate_daily_return()
sharpe_ratio = calculate_rolling_sharpe(window=30)
drawdown = calculate_current_drawdown()
# Alert thresholds
if daily_return < -0.05: # -5% daily loss
send_alert("CRITICAL: Daily loss exceeds 5%")
if drawdown > 0.15: # 15% drawdown
send_alert("WARNING: Drawdown exceeds 15%")
๐ Performance Charts
- Portfolio Value: Grafico valore nel tempo
- Daily Returns: Distribuzione rendimenti giornalieri
- Drawdown: Curve di drawdown
- Action Distribution: Frequenza azioni (Hold/Buy/Sell)
- Win/Loss Ratio: Analisi trades profittevoli
๐ ๏ธ Model Persistence
๐พ Saving/Loading Models
# Save trained model
torch.save({
'model_state_dict': agent.network.state_dict(),
'optimizer_state_dict': agent.optimizer.state_dict(),
'training_metrics': training_history,
'hyperparameters': config
}, 'models/rl_model.pth')
# Load model
checkpoint = torch.load('models/rl_model.pth')
agent.network.load_state_dict(checkpoint['model_state_dict'])
๐ Model Versioning
- v1.0: Base DQN implementation
- v1.1: Added technical indicators
- v1.2: Improved reward function
- v1.3: Portfolio risk management
- v2.0: Multi-strategy integration
๐งช Backtesting Framework
๐ Historical Testing
# Backtest multiple periods
results = {}
for year in [2019, 2020, 2021, 2022, 2023]:
data = get_year_data(year)
returns = backtest_agent(agent, data)
results[year] = {
'total_return': calculate_total_return(returns),
'sharpe_ratio': calculate_sharpe(returns),
'max_drawdown': calculate_max_drawdown(returns)
}
๐ Benchmark Comparison
Strategia |
Total Return |
Sharpe Ratio |
Max Drawdown |
RL Agent |
15.3% |
1.24 |
-8.2% |
Buy & Hold |
12.1% |
0.89 |
-12.5% |
SMA Crossover |
8.7% |
0.76 |
-15.1% |
Random |
2.3% |
0.12 |
-25.3% |
๐ฎ Future Enhancements
๐ Planned Features
- Multi-Asset Support: Trading su portfolio diversificato
- Ensemble Methods: Combinazione multiple reti neurali
- Transfer Learning: Applicare modelli pre-trained
- Real-time Learning: Aggiornamento continuo modello
- Alternative Algorithms: A3C, PPO, SAC testing
๐ฏ Research Areas
- Feature Engineering: Nuovi indicatori tecnici
- Reward Engineering: Funzioni reward piรน sofisticate
- Market Regime Detection: Adattamento a condizioni mercato
- Risk-Aware Learning: Incorporare risk constraints nel training
๐ Resources & Documentation
Il RL Agent rappresenta il core intelligente del sistema, combinando deep learning con esperienza di trading per decisioni ottimali.