sample training script guide - kcccr123/ue-reinforcement-learning GitHub Wiki
Training Script
The provided sample training script in train.py is a demo of the lifecycle of Reinforcement Learning (RL) models using Stable-Baselines3 and a TCP-connected Unreal simulation environment. It takes in a config file, initiates environment communication, and executes model training.
Overview
The Training Script performs:
- Parsing YAML configurations specifying hyperparameters and paths.
- Establishing connections to Unreal Engine using TCP sockets and handling handshakes.
- Wrapping environments using GymWrapper instances.
- Executing training via Stable-Baselines3, supporting algorithms like PPO, A2C, SAC, TD3, and DDPG.
- Automatical checkpointing.
Usage
- Prepare Configuration File:
- Create a YAML configuration file specifying the RL algorithm, hyperparameters, training steps, and checkpoint paths.
- Run Training Script:
cd PythonEnv
python train.py --config <Path to your config file>
YAML Configuration Structure Set Up
The training process is configured through a .yaml file, specify the supported model name in all lowercase at the very top of the file, then fill in the shared and model-specific parameters in the on/off policy sections. Most of the common parameters for each supported model are included, for details of what each parameter indicates or other parameters for a model please refer to the RL Algorithms section of Stable-Baselines3. Note hyperparameters not in the config template below will need to be added into the model creation section of the training script manually.
Template:
model: ppo/a2c/sac/td3/ddpg
model_name: # File name used when saving to save directory
load_from_checkpoint: # True indicates for training script to continue training from a previously trained model
total_timesteps: # Total training timesteps
checkpoint_freq: # Number of timesteps before checkpointing
paths:
save_dir: # Path to the directory where model is saved, this will be created automatically if it doesn't exist
previous_model: # Path to ZIP file of previously trained model. ***Remember to set load_from_checkpoint to be true***
common_parameters:
device: ‘auto’/‘cuda’/‘cpu’ # If set to auto, the code will be run on the GPU if possible.
learning_rate:
gamma: # Discount factor
verbose: # 0-no msg / 1-info / 2-debug
batch_size: # Mini-batch size for ppo updates
on_policy:
# Below are common parameters shared by On Policy models
n_steps: 2048
gae_lambda: 0.95
ent_coef: 0.00
# Below are parameters specific to On Policy models
ppo:
n_epochs: 10
clip_range: 0.2
vf_coef: 0.5
off_policy:
# Below are common parameters shared by Off Policy models
buffer_size: 1_000_000
tau: 0.005
learning_starts: 100
# Below are parameters specific to Off Policy models
sac:
ent_coef: auto
td3:
policy_delay: 2
target_policy_noise: 0.2
target_noise_clip: 0.5
ddpg:
action_noise: null