v0.1 - Healthcare-Robotics/assistive-gym GitHub Wiki

Assistive Gym v0.1 Wiki

NOTE: This wiki page is a snapshot of the wiki for v0.1 (now deprecated) of Assistive Gym. The information on this page corresponds to the v0.1 branch, which served as the original code release for the paper 'Assistive Gym: A Physics Simulation Framework for Assistive Robotics'.


Assistive Gym

Assistive Gym is a physics-based simulation framework for physical human-robot interaction and robotic assistance.

Assistive Gym is integrated into the OpenAI Gym interface, enabling the use of existing reinforcement learning and control algorithms to teach robots how to interact with people.

Assistive Gym

Paper

A paper on Assistive Gym can be found at https://arxiv.org/pdf/1910.04700.pdf

Z. Erickson, V. Gangaram, A. Kapusta, C. K. Liu, and C. C. Kemp, “Assistive Gym: A Physics Simulation Framework for Assistive Robotics”, IEEE International Conference on Robotics and Automation (ICRA), 2020.

@article{erickson2020assistivegym,
  title={Assistive Gym: A Physics Simulation Framework for Assistive Robotics},
  author={Erickson, Zackory and Gangaram, Vamsee and Kapusta, Ariel and Liu, C. Karen and Kemp, Charles C.},
  journal={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2020}
}

1. Install

Simple installation using a python virtual environment.

We encourage installing Assistive Gym and the custom PyBullet engine in a python virtualenv.
NOTE: If you are not using a python virtual environment, you can replace pip3 with python3 -m pip install for easier installations.

python3 -m pip install virtualenv
python3 -m venv env
source env/bin/activate
pip3 install git+https://github.com/Zackory/bullet3.git
git clone -b v0.1 https://github.com/Healthcare-Robotics/assistive-gym.git
cd assistive-gym
pip3 install .
# Leave virtual env with: deactivate

Step-by-step installation

Assistive Gym relies on a customized pybullet physics engine (mostly for enhanced soft body simulation in Python). This can be installed using the following:

python3 -m pip install git+https://github.com/Zackory/bullet3.git

We can then install Assistive Gym and all the required dependencies.

git clone -b v0.1 https://github.com/Healthcare-Robotics/assistive-gym.git
cd assistive-gym
python3 -m pip install .

If you want to make changes to Assistive Gym code without having to reinstall, install Assistive Gym in editable mode.

python3 -m pip install -e .

Once Assistive Gym is installed, you can test out various Assistive Gym environments. A full list of all environments provided in Assistive Gym can be found Here.

python3 env_viewer.py --env "FeedingJaco-v0"

Installing on Windows

  1. Install Python 3.6.8 at: https://www.python.org/ftp/python/3.6.8/python-3.6.8-amd64.exe
  2. Install git: https://git-scm.com/download/win
  3. Install Microsoft visual c++ build tools: https://www.microsoft.com/en-us/download/details.aspx?id=48159
  4. Install c++ build tools for visual studio: https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019
    -- during installation, click on the "C++ build tools" checkbox.

Then, run the following commands in a command prompt

mkdir git
cd git
python -m pip install --user virtualenv
python -m venv env
env\Scripts\activate
pip install --upgrade setuptools
pip install git+https://github.com/Zackory/bullet3.git
git clone -b v0.1 https://github.com/Healthcare-Robotics/assistive-gym.git
cd assistive-gym
pip install .

2. Environments

Test Out Environments

We provide a simple script for visualizing the various environments available in Assistive Gym.
The below command will render a wheelchair-mounted Jaco robot taking random actions within a feeding assistance environment with a static person.

python3 env_viewer.py --env "FeedingJaco-v0"

Environments

6 assistive tasks (ScratchItch, BedBathing, Feeding, Drinking, Dressing, and ArmManipulation).
4 commercial robots (PR2, Jaco, Baxter, Sawyer).
2 human states: static or active (takes actions according to a separate control policy).
48 total combinations
Static human environment names: [task][robot]-v0
Active human environment names: [task][robot]Human-v0

Itch Scratching Assistance

Static human
ScratchItchPR2-v0, ScratchItchJaco-v0, ScratchItchBaxter-v0, ScratchItchSawyer-v0
Active human
ScratchItchPR2Human-v0, ScratchItchJacoHuman-v0, ScratchItchBaxterHuman-v0, ScratchItchSawyerHuman-v0

Bed Bathing Assistance

Static human
BedBathingPR2-v0, BedBathingJaco-v0, BedBathingBaxter-v0, BedBathingSawyer-v0
Active human
BedBathingPR2Human-v0, BedBathingJacoHuman-v0, BedBathingBaxterHuman-v0, BedBathingSawyerHuman-v0

Feeding Assistance

Static human
FeedingPR2-v0, FeedingJaco-v0, FeedingBaxter-v0, FeedingSawyer-v0
Active human
FeedingPR2Human-v0, FeedingJacoHuman-v0, FeedingBaxterHuman-v0, FeedingSawyerHuman-v0

Drinking Assistance

Static human
DrinkingPR2-v0, DrinkingJaco-v0, DrinkingBaxter-v0, DrinkingSawyer-v0
Active human
DrinkingPR2Human-v0, DrinkingJacoHuman-v0, DrinkingBaxterHuman-v0, DrinkingSawyerHuman-v0

Dressing Assistance

Static human
DressingPR2-v0, DressingJaco-v0, DressingBaxter-v0, DressingSawyer-v0
Active human
DressingPR2Human-v0, DressingJacoHuman-v0, DressingBaxterHuman-v0, DressingSawyerHuman-v0

Arm Manipulation Assistance

Static human
ArmManipulationPR2-v0, ArmManipulationJaco-v0, ArmManipulationBaxter-v0, ArmManipulationSawyer-v0
Active human
ArmManipulationPR2Human-v0, ArmManipulationJacoHuman-v0, ArmManipulationBaxterHuman-v0, ArmManipulationSawyerHuman-v0


3. Getting Started

A quick guide for simulating robots helping people with Assistive Gym.

1. Installation

We will first want to install Assistive Gym and dependencies.
If you prefer to install using a python virtual environment, see the Install Guide.

python3 -m pip install git+https://github.com/Zackory/bullet3.git
git clone -b v0.1 https://github.com/Healthcare-Robotics/assistive-gym.git
cd assistive-gym
python3 -m pip install .

2. Test out a few environments

We can quickly visualize some of the existing assistive robotics environments using env_viewer.py.
Here, a Sawyer robot will help clean off a person's arm who is lying in a resting pose on a bed.

python3 env_viewer.py --env "BedBathingSawyer-v0"

Bed Bathing Sawyer

We can also visualize a collaborative assistance environment where both the robot and human take random actions.

python3 env_viewer.py --env "ScratchItchJacoHuman-v0"

Itch Scratching Jaco Human

When creating new environments, this viewer can also be helpful for visualizing and debugging changes.

3. Running a pretrained control policy

We provide trained robot/human control policies for all of the available environments.
These control policies were trained using Proximal Policy Optimization (PPO) through a PyTorch library.
If you do not have wget installed on your machine, you can download the models directly from the GitHub release page.

Downloading policies and libraries

python3 -m pip install git+https://github.com/Zackory/pytorch-a2c-ppo-acktr --no-cache-dir
python3 -m pip install git+https://github.com/openai/baselines.git
# Download pretrained policies
wget -O trained_models/ppo/pretrained_policies.zip https://github.com/Healthcare-Robotics/assistive-gym/releases/download/0.100/pretrained_policies.zip
unzip trained_models/ppo/pretrained_policies.zip -d trained_models/ppo

Here we will run one of these policies for a PR2 providing drinking assistance to a person who has learned to actively collaborate with the robot through co-optimization.

python3 -m ppo.enjoy_coop --env-name "DrinkingPR2Human-v0"

4. Train a new control policy

New control policies for Assistive Gym can be trained using any library that works with the OpenAI Gym interface.
Here, we can train a new policy for a Jaco robot helping to scratch an itch using the PyTorch library from above.

python3 -m ppo.train --env-name "ScratchItchJaco-v0" --num-env-steps 20000 --save-dir ./trained_models_new/

As an example, we will only train for 20,000 time steps (100 simulation rollouts), yet increasing num-env-steps can greatly improve the learned policy. The pretrained policies we provide were each trained using 10,000,000 time steps (50,000 simulation rollouts).
We can now visualize the newly learned control policy:

python3 -m ppo.enjoy --env-name "ScratchItchJaco-v0" --load-dir trained_models_new/ppo

4. Running Pretrained Policies

We provide pretrained control policies for each robot and assistive task.
These controllers are trained using Proximal Policy Optimization (PPO) implemented in PyTorch.
The pretrained models were trained for 10,000,000 time steps (50,000 simulation rollouts) on a 36 virtual core AWS machine.

Download library and models

The PyTorch library and pretrained policies can be downloaded using the commands below.
If you do not have wget installed on your machine, you can download the models directly from the GitHub release page.
You may also need to install OpenCV for OpenAI Baselines. Ubuntu: sudo apt-get install python3-opencv Mac: brew install opencv

# Install pytorch RL library
pip3 install git+https://github.com/Zackory/pytorch-a2c-ppo-acktr --no-cache-dir
# Install OpenAI Baselines 0.1.6
pip3 install git+https://github.com/openai/baselines.git
# Download pretrained policies
wget -O trained_models/ppo/pretrained_policies.zip https://github.com/Healthcare-Robotics/assistive-gym/releases/download/0.100/pretrained_policies.zip
unzip trained_models/ppo/pretrained_policies.zip -d trained_models/ppo

Robot assisting a static person

Here we evaluate a pretrained policy for a Baxter robot assisting to scratch an itch on a person's right arm, while the person sits with a static pose in a wheelchair.

python3 -m ppo.enjoy --env-name "ScratchItchBaxter-v0"

Collaborative assistance - robot assisting an active human

We also provide pretrained policies for a robot and human that learned to collaborate to achieve the same assistive task. Both the robot and human have separate control policies, that are trained simultaneously via co-optimization.

python3 -m ppo.enjoy_coop --env-name "DrinkingSawyerHuman-v0"

Evaluating and comparing policies over 100 trails

We can also compare control policies for a given assistive task. We evaluate a policy over 100 simulation rollouts of the task to calculate the average reward and task success.

python3 -m ppo.enjoy_100trials --env-name "FeedingPR2-v0"

We can also compare policies for collaborative assistance environments where both the robot and human take actions according to co-optimized policies.

python3 -m ppo.enjoy_coop_100trials --env-name "BedBathingJacoHuman-v0"

5. Training New Policies

Training a new control policy can be accomplished using any control learning library that works with the OpenAI Gym interface.
We provide a customized PyTorch library that we used for training and evaluating control policies in Assistive Gym.
Details on installing this PyTorch library can be found in the previous section Running Pretrained Policies.

Robot assisting a static person

Here we will train a policy for a Jaco robot providing itch scratching assistance to a person who holds a static pose.
We will train the policy over 2,000,000 simulation time steps (10,000 rollouts).

python3 -m ppo.train --env-name "ScratchItchJaco-v0" --num-env-steps 2000000 --save-dir ./trained_models_new/

We can then evaluate the recently trained policy.

python3 -m ppo.enjoy --env-name "ScratchItchJaco-v0" --load-dir trained_models_new/ppo

When training on remote machines, such as AWS, we suggest using nohup to train policies in the background, disconnected from the terminal instance.

nohup python3 -m ppo.train --env-name "DrinkingSawyer-v0" --num-env-steps 10000000 --save-dir ./trained_models_new/ > nohup.out &

Collaborative assistance via co-optimization

We can also train policies for robots collaborating with an active human.
Both the robot and human are active agents that have their own separate control policies, which are trained simultaneously using co-optimization.

python3 -m ppo.train_coop --env-name "FeedingPR2Human-v0" --num-env-steps 10000000 --save-dir ./trained_models_new/

NOTE: Training robot and human policies simultaneously requires co-optimization. Most RL libraries do not have co-optimization implemented.
Do not use single agent RL approaches (such as ppo_train) for collaborative assistance as it allows the robot control policy to cheat and see the human's observations.
Any environment name that contains 'Human' is a collaborative assistance environment.
To see how co-optimization is implemented, you can compare the train.py and train_coop.py scripts.

Using your own control learning library

If you are already familiar with the OpenAI Gym interface, or reinforcement learning, you can use your favorite control learning algorithms/libraries with the installed Assistive Gym environments.
For example, you can use OpenAI Baselines to train policies for the robot.

python3 -m baselines.run --alg=ppo2 --env=assistive_gym:ScratchItchJaco-v0 --network=mlp --num_timesteps=1e7

6. Creating a New Assistive Environment

Intall in editable mode

When creating new environments, it is helpful to install Assistive Gym in editable mode.

cd assistive-gym
python3 -m pip install -e .

Template

Here is the basic template for creating a new environment in Assistive Gym.
We would create a new environment file: assistive-gym/assistive_gym/envs/new_task.py

import os
from gym import spaces
import numpy as np
import pybullet as p
from .env import AssistiveEnv

class NewTaskEnv(AssistiveEnv):
    def __init__(self, robot_type='pr2', human_control=False):
        ...
    def step(self, action):
        ...
    def _get_obs(self, forces, forces_human):
        ...
    def reset(self):
        ...

step() is called each time the robot executes a new action.

_get_obs() is called when we want the current observations available to the robot.

reset() resets the environment and is called prior to the beginning of each simulation rollout.

We then need to create the file assistive-gym/assistive_gym/envs/new_task_robots.py

from .new_task import NewTaskEnv

class NewTaskPR2Env(NewTaskEnv):
    def __init__(self):
        super(NewTaskPR2Env, self).__init__(robot_type='pr2', human_control=False)

We will then add the following line to the assistive-gym/assistive_gym/envs/__init__.py

from assistive_gym.envs.new_task_robots import NewTaskPR2Env

And the following lines to assistive-gym/assistive_gym/__init__.py

# NewTask PR2
register(
    id='NewTaskPR2-v0',
    entry_point='assistive_gym.envs:NewTaskPR2Env',
    max_episode_steps=200,
)

A fully functioning example environment

Let's create a new environment where a robot needs to reach towards a person's mouth. We will implement this environment for both the PR2 and Jaco robots.
Additionally, this environment will support both a static human and active human agent (where we can learn a separate policy for controlling the person's head through co-optimization).
This will be a simplified version of the feeding.py environment.
We begin by creating a new file: assistive-gym/assistive_gym/envs/reaching.py

import os
from gym import spaces
import numpy as np
import pybullet as p
from .env import AssistiveEnv

class ReachingEnv(AssistiveEnv):
    def __init__(self, robot_type='pr2', human_control=False):
        super(ReachingEnv, self).__init__(robot_type=robot_type, task='reaching', human_control=human_control, frame_skip=5, time_step=0.02, action_robot_len=7, action_human_len=(4 if human_control else 0), obs_robot_len=21, obs_human_len=(19 if human_control else 0))

    def step(self, action):
        # Execute action
        self.take_step(action, robot_arm='right', gains=self.config('robot_gains'), forces=self.config('robot_forces'), human_gains=0.05)

        # Get total force the robot's body is applying to the person
        robot_force_on_human = 0
        for c in p.getContactPoints(bodyA=self.robot, bodyB=self.human, physicsClientId=self.id):
            robot_force_on_human += c[9]

        # Get the Euclidean distance between the robot's end effector and the person's mouth
        gripper_pos = np.array(p.getLinkState(self.robot, 54 if self.robot_type=='pr2' else 8, computeForwardKinematics=True, physicsClientId=self.id)[0])
        reward_distance_mouth = -np.linalg.norm(gripper_pos - self.target_pos)
        # Get end effector velocity
        end_effector_velocity = np.linalg.norm(p.getLinkState(self.robot, 76 if self.robot_type=='pr2' else 8, computeForwardKinematics=True, computeLinkVelocity=True, physicsClientId=self.id)[6])

        # Get human preferences
        preferences_score = self.human_preferences(end_effector_velocity=end_effector_velocity, total_force_on_human=robot_force_on_human)
        # Get observations and reward
        obs = self._get_obs([], [robot_force_on_human])
        reward = self.config('distance_weight')*reward_distance_mouth + preferences_score
        info = {'total_force_on_human': robot_force_on_human, 'task_success': int(reward_distance_mouth <= self.config('task_success_threshold')), 'action_robot_len': self.action_robot_len, 'action_human_len': self.action_human_len, 'obs_robot_len': self.obs_robot_len, 'obs_human_len': self.obs_human_len}
        done = False
        return obs, reward, done, info

    def _get_obs(self, forces, forces_human):
        torso_pos = np.array(p.getLinkState(self.robot, 15 if self.robot_type == 'pr2' else 0, computeForwardKinematics=True, physicsClientId=self.id)[0])
        robot_right_joint_positions = np.array([x[0] for x in p.getJointStates(self.robot, jointIndices=self.robot_right_arm_joint_indices, physicsClientId=self.id)])
        gripper_pos, gripper_orient = p.getLinkState(self.robot, 54 if self.robot_type=='pr2' else 8, computeForwardKinematics=True, physicsClientId=self.id)[:2]
        if self.human_control:
            human_pos = np.array(p.getBasePositionAndOrientation(self.human, physicsClientId=self.id)[0])
            human_joint_positions = np.array([x[0] for x in p.getJointStates(self.human, jointIndices=self.human_controllable_joint_indices, physicsClientId=self.id)])
        head_pos, head_orient = p.getLinkState(self.human, 23, computeForwardKinematics=True, physicsClientId=self.id)[:2]

        robot_obs = np.concatenate([gripper_pos-torso_pos, gripper_pos-self.target_pos, robot_right_joint_positions, gripper_orient, head_orient, forces]).ravel()
        human_obs = np.concatenate([gripper_pos-human_pos, gripper_pos-self.target_pos, human_joint_positions, gripper_orient, head_orient, forces_human]).ravel() if self.human_control else []
        return np.concatenate([robot_obs, human_obs]).ravel()

    def reset(self):
        # Create the human, wheelchair, and robot
        self.human, self.wheelchair, self.robot, self.robot_lower_limits, self.robot_upper_limits, self.human_lower_limits, self.human_upper_limits, self.robot_right_arm_joint_indices, self.robot_left_arm_joint_indices, self.gender = self.world_creation.create_new_world(furniture_type='wheelchair', static_human_base=True, human_impairment='random', print_joints=False, gender='random')
        self.robot_lower_limits = self.robot_lower_limits[self.robot_right_arm_joint_indices]
        self.robot_upper_limits = self.robot_upper_limits[self.robot_right_arm_joint_indices]
        self.reset_robot_joints()
        if self.robot_type == 'jaco':
            # Attach the Jaco robot to the wheelchair
            wheelchair_pos, wheelchair_orient = p.getBasePositionAndOrientation(self.wheelchair, physicsClientId=self.id)
            p.resetBasePositionAndOrientation(self.robot, np.array(wheelchair_pos) + np.array([-0.35, -0.3, 0.3]), p.getQuaternionFromEuler([0, 0, 0], physicsClientId=self.id), physicsClientId=self.id)

        # Configure the person
        joints_positions = [(6, np.deg2rad(-90)), (16, np.deg2rad(-90)), (28, np.deg2rad(-90)), (31, np.deg2rad(80)), (35, np.deg2rad(-90)), (38, np.deg2rad(80))]
        # Randomize head orientation
        joints_positions += [(21, self.np_random.uniform(np.deg2rad(-30), np.deg2rad(30))), (22, self.np_random.uniform(np.deg2rad(-30), np.deg2rad(30))), (23, self.np_random.uniform(np.deg2rad(-30), np.deg2rad(30)))]
        self.human_controllable_joint_indices = [20, 21, 22, 23]
        self.world_creation.setup_human_joints(self.human, joints_positions, self.human_controllable_joint_indices if (self.human_control or self.world_creation.human_impairment == 'tremor') else [], use_static_joints=True, human_reactive_force=None)
        p.resetBasePositionAndOrientation(self.human, [0, 0.03, 0.89 if self.gender == 'male' else 0.86], [0, 0, 0, 1], physicsClientId=self.id)
        self.target_human_joint_positions = np.array([x[0] for x in p.getJointStates(self.human, jointIndices=self.human_controllable_joint_indices, physicsClientId=self.id)])
        self.human_lower_limits = self.human_lower_limits[self.human_controllable_joint_indices]
        self.human_upper_limits = self.human_upper_limits[self.human_controllable_joint_indices]

        # Create a target on the person's mouth
        self.mouth_pos = [0, -0.11, 0.03] if self.gender == 'male' else [0, -0.1, 0.03]
        head_pos, head_orient = p.getLinkState(self.human, 23, computeForwardKinematics=True, physicsClientId=self.id)[:2]
        target_pos, target_orient = p.multiplyTransforms(head_pos, head_orient, self.mouth_pos, [0, 0, 0, 1], physicsClientId=self.id)
        self.target_pos = np.array(target_pos)
        sphere_collision = -1
        sphere_visual = p.createVisualShape(shapeType=p.GEOM_SPHERE, radius=0.01, rgbaColor=[0, 1, 0, 1], physicsClientId=self.id)
        self.target = p.createMultiBody(baseMass=0.0, baseCollisionShapeIndex=sphere_collision, baseVisualShapeIndex=sphere_visual, basePosition=self.target_pos, useMaximalCoordinates=False, physicsClientId=self.id)

        # Change default camera position/angle
        p.resetDebugVisualizerCamera(cameraDistance=1.10, cameraYaw=40, cameraPitch=-45, cameraTargetPosition=[-0.2, 0, 0.75], physicsClientId=self.id)

        # Initialize the starting location of the robot's end effector
        gripper_target_pos = np.array([-0.15, -0.4, 1]) + self.np_random.uniform(-0.05, 0.05, size=3)
        if self.robot_type == 'pr2':
            # Call self.position_robot_toc() to optimize the robot's base position and then position the end effector using IK.
            gripper_target_orient = p.getQuaternionFromEuler([np.pi/2.0, 0, 0], physicsClientId=self.id)
            self.position_robot_toc(self.robot, 54, [(gripper_target_pos, gripper_target_orient), (self.target_pos, None)], [(self.target_pos, None)], self.robot_right_arm_joint_indices, self.robot_lower_limits, self.robot_upper_limits, ik_indices=range(15, 15+7), pos_offset=np.array([0.1, 0.2, 0]), max_ik_iterations=200, step_sim=True, check_env_collisions=False, human_joint_indices=self.human_controllable_joint_indices, human_joint_positions=self.target_human_joint_positions)
            self.world_creation.set_gripper_open_position(self.robot, position=0.03, left=False, set_instantly=True)
        elif self.robot_type == 'jaco':
            # The Jaco is already attached to the wheelchair, so we just need to position the end effector using IK.
            gripper_target_orient = p.getQuaternionFromEuler(np.array([np.pi/2.0, 0, np.pi/2.0]), physicsClientId=self.id)
            self.util.ik_random_restarts(self.robot, 8, gripper_target_pos, gripper_target_orient, self.world_creation, self.robot_right_arm_joint_indices, self.robot_lower_limits, self.robot_upper_limits, ik_indices=[0, 1, 2, 3, 4, 5, 6], max_iterations=1000, max_ik_random_restarts=40, random_restart_threshold=0.01, step_sim=True, check_env_collisions=True)
            self.world_creation.set_gripper_open_position(self.robot, position=1.33, left=False, set_instantly=True)

        # Disable gravity for the robot and human
        p.setGravity(0, 0, -9.81, physicsClientId=self.id)
        p.setGravity(0, 0, 0, body=self.robot, physicsClientId=self.id)
        p.setGravity(0, 0, 0, body=self.human, physicsClientId=self.id)

        # Enable rendering
        p.configureDebugVisualizer(p.COV_ENABLE_RENDERING, 1, physicsClientId=self.id)

        return self._get_obs([], [0])

    def update_targets(self):
        # update_targets() is automatically called at each time step for updating any targets in the environment.
        # Move the target marker onto the person's mouth
        head_pos, head_orient = p.getLinkState(self.human, 23, computeForwardKinematics=True, physicsClientId=self.id)[:2]
        target_pos, target_orient = p.multiplyTransforms(head_pos, head_orient, self.mouth_pos, [0, 0, 0, 1], physicsClientId=self.id)
        self.target_pos = np.array(target_pos)
        p.resetBasePositionAndOrientation(self.target, self.target_pos, [0, 0, 0, 1], physicsClientId=self.id)

Next, create the file: assistive-gym/assistive_gym/envs/reaching_robots.py

from .reaching import ReachingEnv

class ReachingPR2Env(ReachingEnv):
    def __init__(self):
        super(ReachingPR2Env, self).__init__(robot_type='pr2', human_control=False)
class ReachingJacoEnv(ReachingEnv):
    def __init__(self):
        super(ReachingJacoEnv, self).__init__(robot_type='jaco', human_control=False)
class ReachingPR2HumanEnv(ReachingEnv):
    def __init__(self):
        super(ReachingPR2HumanEnv, self).__init__(robot_type='pr2', human_control=True)
class ReachingJacoHumanEnv(ReachingEnv):
    def __init__(self):
        super(ReachingJacoHumanEnv, self).__init__(robot_type='jaco', human_control=True)

Add the following line to: assistive-gym/assistive_gym/envs/__init__.py

from assistive_gym.envs.reaching_robots import ReachingPR2Env, ReachingJacoEnv, ReachingPR2HumanEnv, ReachingJacoHumanEnv

And the following lines to: assistive-gym/assistive_gym/__init__.py

register(
    id='ReachingPR2-v0',
    entry_point='assistive_gym.envs:ReachingPR2Env',
    max_episode_steps=200,
)
register(
    id='ReachingJaco-v0',
    entry_point='assistive_gym.envs:ReachingJacoEnv',
    max_episode_steps=200,
)
register(
    id='ReachingPR2Human-v0',
    entry_point='assistive_gym.envs:ReachingPR2HumanEnv',
    max_episode_steps=200,
)
register(
    id='ReachingJacoHuman-v0',
    entry_point='assistive_gym.envs:ReachingJacoHumanEnv',
    max_episode_steps=200,
)

Finally, we would add the following lines to the configuration file,
assistive-gym/assistive_gym/config.ini

[reaching]
robot_forces = 1.0
robot_gains = 0.05
distance_weight = 1.0
task_success_threshold = 0.03

Training and evaluating a policy

We can now train a policy on our new reaching environment. 1,000,000 time steps should be enough to learn a reasonable control policy.

python3 -m ppo.train --env-name "ReachingJaco-v0" --num-env-steps 1000000 --save-dir ./trained_models_new/

Now, let's test out our new policy.

python3 -m ppo.enjoy --env-name "ReachingJaco-v0" --load-dir trained_models_new/ppo