Run Commands - PathmindAI/nativerl GitHub Wiki
Commands
To initiate training, you must set the following environment variables before commencing training. This step must be repeated everything you train a brand new RL policy.
Set Java Paths
export JAVA_HOME=`pwd`/jdk8u222-b10
export JDK_HOME=$JAVA_HOME
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64/:$LD_LIBRARY_PATH
Initialize Conda Environment
source conda/bin/activate
Training Configuration
export REWARD_TERMS_SNIPPET='rewardTermsRaw[0] = after.goalReached * 0.1;' // Set your reward function here
export MAX_TIME_IN_SEC='43200' // Max training time (default is 43200 or 12 hours)
export NUM_SAMPLES='4' // Parallel trials. We recommend 4.
export MAX_ITERATIONS='500' // Max iterations (500 is usually sufficient)
export CHECKPOINT_FREQUENCY='25'
export RESUME=${RESUME:='false'}
export USER_LOG='true'
export SCHEDULER='PBT' // https://www.deepmind.com/blog/population-based-training-of-neural-networks
export EPISODE_REWARD_RANGE='0.01'
Compute Configuration
export NUM_WORKERS='1' // If you have additional CPU cores, we recommend setting this to 3 or 4.
export NUM_CPUS='1' // Doesn't do much so keep this at 1.
Simulation Configuration
export MAIN_AGENT='Main' // The name of your main agent (typically "Main")
export EXPERIMENT_CLASS='Simulation' // The name of your simulation agent (typically "Simulation")
export EXPERIMENT_TYPE='Simulation'
export MAX_MEMORY_IN_MB='4096'
RL Configuration
export MULTIAGENT='false' // Set to 'true' for multi-agent
export FREEZING='false'
export ACTIONMASKS='false' // Set to 'true' if you are using action masking
export TRAIN_BATCH_MODE='complete_episodes'
export NUM_HIDDEN_NODES='256'
export NUM_HIDDEN_LAYERS='2'
Does Nothing But Necessary for AnyLogic
export NAMED_VARIABLE='true'
echo > setup.sh
mkdir -p database
touch database/db.properties
Execute Training
bash train.sh
Key Configurations
You will not need to touch most of these configurations. However, some are crucial to ensure the policy learns well.
Training Duration
The settings below determine the duration of training. These should be good enough for 90% of use cases. For particularly complex simulations, try 24 hours and 2000 training iterations.
export MAX_TIME_IN_SEC='43200' // Default is 12 hours
export MAX_ITERATIONS='500' // 500 training iterations
Early Stopping Criteria
To minimize wasted training time, NativeRL has a built-in early stopping mechanism. There are many stopping criteria, but I would only focus on the below for now.
export EPISODE_REWARD_RANGE='0.01' // Stop if reward doesn't change by more than 1% over 75 iterations
Compute
The three settings below are immensely important. They basically dictate how much “experience” we are giving the RL policy in each training iteration. The more the better but, we are limited by compute.
export NUM_WORKERS='4' // More workers means more parallel AnyLogic simulation runs (i.e. episodes)
export NUM_SAMPLES='4' // Read Population-Based Training to learn this one