Configuration & Hyperparameters - KunjShah01/RL-A2A GitHub Wiki

Configuration & Hyperparameters

RL-A2A uses configuration files (YAML/JSON) and command-line arguments to control experiments.

Key Hyperparameters

  • Learning Rate: 1e-3 (default, customizable)
  • Batch Size: 32-1024
  • Gamma (Discount Factor): 0.99
  • Lambda (GAE): 0.95
  • Entropy Coefficient: For exploration
  • Clip Range (PPO): 0.2 (if using PPO)
  • Number of Actors: Adjustable for A2A

Example Config File

env: CartPole-v1
algo: A2A
learning_rate: 0.0005
gamma: 0.99
entropy_coef: 0.01
num_actors: 2
total_timesteps: 1000000

Overriding via CLI

python train.py --env CartPole-v1 --algo A2A --learning-rate 0.0005

Customization