sample training script guide - kcccr123/ue-reinforcement-learning GitHub Wiki

Training Script

The provided sample training script in train.py is a demo of the lifecycle of Reinforcement Learning (RL) models using Stable-Baselines3 and a TCP-connected Unreal simulation environment. It takes in a config file, initiates environment communication, and executes model training.


Overview

The Training Script performs:

  • Parsing YAML configurations specifying hyperparameters and paths.
  • Establishing connections to Unreal Engine using TCP sockets and handling handshakes.
  • Wrapping environments using GymWrapper instances.
  • Executing training via Stable-Baselines3, supporting algorithms like PPO, A2C, SAC, TD3, and DDPG.
  • Automatical checkpointing.

Usage

  1. Prepare Configuration File:
    • Create a YAML configuration file specifying the RL algorithm, hyperparameters, training steps, and checkpoint paths.
  2. Run Training Script:
    • cd PythonEnv
    • python train.py --config <Path to your config file>

YAML Configuration Structure Set Up

The training process is configured through a .yaml file, specify the supported model name in all lowercase at the very top of the file, then fill in the shared and model-specific parameters in the on/off policy sections. Most of the common parameters for each supported model are included, for details of what each parameter indicates or other parameters for a model please refer to the RL Algorithms section of Stable-Baselines3. Note hyperparameters not in the config template below will need to be added into the model creation section of the training script manually.

Template:

model: ppo/a2c/sac/td3/ddpg
model_name: # File name used when saving to save directory
load_from_checkpoint: # True indicates for training script to continue training from a previously trained model
total_timesteps: # Total training timesteps
checkpoint_freq: # Number of timesteps before checkpointing

paths:
  save_dir: # Path to the directory where model is saved, this will be created automatically if it doesn't exist
  previous_model: # Path to ZIP file of previously trained model. ***Remember to set load_from_checkpoint to be true***
common_parameters:
  device: ‘auto’/‘cuda’/‘cpu’ # If set to auto, the code will be run on the GPU if possible.
  learning_rate: 
  gamma: # Discount factor
  verbose: # 0-no msg / 1-info / 2-debug
  batch_size: # Mini-batch size for ppo updates

on_policy:
  # Below are common parameters shared by On Policy models
  n_steps: 2048
  gae_lambda: 0.95
  ent_coef: 0.00
  # Below are parameters specific to On Policy models
  ppo:
    n_epochs: 10
    clip_range: 0.2
    vf_coef: 0.5

off_policy:
  # Below are common parameters shared by Off Policy models
  buffer_size: 1_000_000
  tau: 0.005
  learning_starts: 100

  # Below are parameters specific to Off Policy models
  sac:
    ent_coef: auto

  td3:
    policy_delay: 2
    target_policy_noise: 0.2
    target_noise_clip: 0.5

  ddpg:
    action_noise: null