Configuration & Hyperparameters - KunjShah01/RL-A2A GitHub Wiki
Configuration & Hyperparameters
RL-A2A uses configuration files (YAML/JSON) and command-line arguments to control experiments.
Key Hyperparameters
- Learning Rate:
1e-3
(default, customizable) - Batch Size:
32-1024
- Gamma (Discount Factor):
0.99
- Lambda (GAE):
0.95
- Entropy Coefficient: For exploration
- Clip Range (PPO):
0.2
(if using PPO) - Number of Actors: Adjustable for A2A
Example Config File
env: CartPole-v1
algo: A2A
learning_rate: 0.0005
gamma: 0.99
entropy_coef: 0.01
num_actors: 2
total_timesteps: 1000000
Overriding via CLI
python train.py --env CartPole-v1 --algo A2A --learning-rate 0.0005
Customization
- Add new configs in the
configs/
directory. - For advanced options, see Experimentation & Customization.