4. Running Pretrained Policies - Healthcare-Robotics/assistive-gym GitHub Wiki
v0.1 of Assistive Gym to use these policies.
NOTE: Pretrained policies are not yet done training for v1.0. Until then, v0.1 policies are available and the instructions below correspond to the v0.1 policies. You can installWe provide pretrained control policies for each robot and assistive task.
These controllers are trained using Proximal Policy Optimization (PPO) implemented in PyTorch.
The pretrained models were trained for 10,000,000 time steps (50,000 simulation rollouts) on a 36 virtual core AWS machine.
Download library and models
The PyTorch library and pretrained policies can be downloaded using the commands below.
If you do not have wget
installed on your machine, you can download the models directly from the GitHub release page.
You may also need to install OpenCV for OpenAI Baselines. Ubuntu: sudo apt-get install python3-opencv
Mac: brew install opencv
# Install pytorch RL library
pip3 install git+https://github.com/Zackory/pytorch-a2c-ppo-acktr --no-cache-dir
# Install OpenAI Baselines 0.1.6
pip3 install git+https://github.com/openai/baselines.git
# Download pretrained policies
wget -O trained_models/ppo/pretrained_policies.zip https://github.com/Healthcare-Robotics/assistive-gym/releases/download/0.100/pretrained_policies.zip
unzip trained_models/ppo/pretrained_policies.zip -d trained_models/ppo
Robot assisting a static person
Here we evaluate a pretrained policy for a Baxter robot assisting to scratch an itch on a person's right arm, while the person sits with a static pose in a wheelchair.
python3 -m ppo.enjoy --env-name "ScratchItchBaxter-v0"
Collaborative assistance - robot assisting an active human
We also provide pretrained policies for a robot and human that learned to collaborate to achieve the same assistive task. Both the robot and human have separate control policies, that are trained simultaneously via co-optimization.
python3 -m ppo.enjoy_coop --env-name "DrinkingSawyerHuman-v0"
Evaluating and comparing policies over 100 trails
We can also compare control policies for a given assistive task. We evaluate a policy over 100 simulation rollouts of the task to calculate the average reward and task success.
python3 -m ppo.enjoy_100trials --env-name "FeedingPR2-v0"
We can also compare policies for collaborative assistance environments where both the robot and human take actions according to co-optimized policies.
python3 -m ppo.enjoy_coop_100trials --env-name "BedBathingJacoHuman-v0"