Configuration & Hyperparameters - KunjShah01/RL-A2A GitHub Wiki

Configuration & Hyperparameters

RL-A2A uses configuration files (YAML/JSON) and command-line arguments to control experiments.

Key Hyperparameters

Learning Rate: 1e-3 (default, customizable)
Batch Size: 32-1024
Gamma (Discount Factor): 0.99
Lambda (GAE): 0.95
Entropy Coefficient: For exploration
Clip Range (PPO): 0.2 (if using PPO)
Number of Actors: Adjustable for A2A

Example Config File

env: CartPole-v1
algo: A2A
learning_rate: 0.0005
gamma: 0.99
entropy_coef: 0.01
num_actors: 2
total_timesteps: 1000000

Overriding via CLI

python train.py --env CartPole-v1 --algo A2A --learning-rate 0.0005

Customization

Add new configs in the configs/ directory.
For advanced options, see Experimentation & Customization.