Graph based DRL - learningLogisticsLab/binPicking GitHub Wiki

State Representation:

Using a flat spatial representation prevents the robot to have a great perspective of the environment and the objects. Visuo-motor rep’s (RGB(D) + Spatial-Information) are shown to be richer [QT-OPT, Q2-OPT].

Invariant Transformations

So we want to use RGB(D) data + spatial data as our state representation. We also want to do invariant transformation on our state data, i.e. rotations, translations, reflections (to understand invariant (or equivariant) transformations see ITER). However, synchronizing or calibrating the transformations of images with those of spatial data is not so straightforward. In-come graph representations.

Graphs

Graph NNs have recently gained lots of traction (for example check this brief overview), or check out the definition and code in this paper VSGAT. By using graphs, we may have the best of both worlds. Let's introduce graphs very simply below.

Graphs consist of nodes and edges. Nodes in the graph can represent the RGB(D) data of segmented objects as well as the robot gripper. This allows us to have visual feedback from the environment. Edges can contain relevant spatial information across objects. See Hindisght Experience Replay for an example of relevant spatial information.

Motivating Problem

Now, say you only segment-specific objects. And you have a reaching task where a robot arm descends on objects as in QT-OPT settings (or many others like it). So we want to learn faster, so we take that robot experience and then pretend that the robot had done the same thing by rotating the scene by some angle (or multiple instances across multiple angles). In that case, the path of the trajectory remains unchanged but the orientation and angular velocity would change. The nodes which contain the segmented objects don't actually change since all entities in the scene were transformed in the same way. We could choose any number of transformations to motivate the latter instead of just using rotations. We can also compound the transformations (i.e. rotations followed by translations followed by more translations).

Object segmentation/tracker

As with VSGAT, we will build the graph nodes by using an object segmented. In this paper, Fast-RCNN was used. But since this time, better object segmenters/trackers have been designed. We will need to integrate a better one to our work. The following link provides a nice overview of the SOTA: https://blog.netcetera.com/object-detection-and-tracking-in-2020-f10fb6ff9af3.

Things to Resolve

Note, we will have to explore exactly how how to connect node entities (objects and gripper) and how to define the edges. I.e.

Edge Connection examples: (1) connect all entities to ALL entities (many-to-many); or (2) all objects and gripper only connect to the current target object (many-to-one).
State definition: (1) keep relative pose transforms between all objects and the target object and then for gripper and target object keep the HER state definition (state size dimensionality there is 25.)

Roadmap

Study robosuite.ai and robosuite-benchmarks
Understand the standard SAC algorithm with RGB
Study the 3 papers that use Graphs in the home page
Propose a way to connect graphs with SAC in manipulation
Code and Test

Other works with Graphs and DRL

Deep Reinforcement Learning meets Graph Neural Networks: exploring a routing optimization use case