5. Training New Policies - Healthcare-Robotics/assistive-gym GitHub Wiki

Training a new control policy can be accomplished using any control learning library that works with the OpenAI Gym interface.
See this Google Colab script for several examples of reinforcement learning with Assistive Gym.
Assistive Gym comes with built-in functions for easily training and evaluating reinforcement learning (RL) policies using RLlib. These functions are available in learn.py.

Robot assisting a static person

Train: First, let's train proximal policy optimization (PPO) for a Sawyer robot providing feeding assistance to a person who holds a static pose. We will only train for 100,000 time steps (500 rollouts), although > 1,000,000 would result in better policies.

python3 -m assistive_gym.learn --env "FeedingSawyer-v1" --algo ppo --train --train-timesteps 100000 --save-dir ./trained_models/

When training on remote machines, such as AWS, we suggest using nohup to train policies in the background, disconnected from the terminal instance.

nohup python3 -m assistive_gym.learn --env "FeedingSawyer-v1" --algo ppo --train --train-timesteps 100000 --save-dir ./trained_models/ > nohup.out &

Resume training: We can resume training (or fine-tune) an existing policy using the --load-policy-path argument.

python3 -m assistive_gym.learn --env "FeedingSawyer-v1" --algo ppo --train --train-timesteps 150000 --save-dir ./trained_models/ --load-policy-path ./trained_models/

Render: Then, we can render a single rollout of this trained policy.

python3 -m assistive_gym.learn --env "FeedingSawyer-v1" --algo ppo --render --seed 0 --load-policy-path ./trained_models/ --render-episodes 10

Evaluate: We can also evaluate the average reward and task success of our new policy over 100 simulation trials.

python3 -m assistive_gym.learn --env "FeedingSawyer-v1" --algo ppo --evaluate --eval-episodes 100 --seed 0 --verbose --load-policy-path ./trained_models/

Collaborative assistance via co-optimization

We can also train policies for robots collaborating with an active human.
We model human motion in Assistive Gym using a technique called co-optimization. This works by simultaneously training separate policies for both the robot and human. Both policies receive the same reward, but have separate observations.
In the example below, a Sawyer will learn to collaboratively assist with feeding a person that can move their head.

python3 -m assistive_gym.learn --env "FeedingSawyerHuman-v1" --algo ppo --train --train-timesteps 150000 --save-dir ./trained_models/

NOTE: Co-optimization in Assistive Gym is currently only supported in RLlib, or by libraries that leverage RLlib's MultiAgentEnv base class. These collaborative assistance environments are multi-agent environments where both the robot and human are optimizing control policies simultaneously. Any environment name that contains 'Human' is a collaborative assistance environment.

Using your own control learning library

If you are already familiar with the OpenAI Gym interface, or reinforcement learning, you can use your favorite control learning algorithms/libraries with the installed Assistive Gym environments.
For example, you can use OpenAI Baselines to train policies for the robot.

python3 -m baselines.run --alg=ppo2 --env=assistive_gym:ScratchItchJaco-v1 --network=mlp --num_timesteps=1e7