Reinforcement Learning Pipeline Custom Replay Buffer - HU-ICT-LAB/RobotWars GitHub Wiki

The problem

We intend to implement Stable Baslines 3[1] in the reinforcement learning pipeline as it houses a set of useful functions and algorithms (including SAC which we want to use) for training and deploying algorithms.

It does however come with some challenges specific to our context; for example, the replay buffers - objects which store data in a format conforming reinforcement learning (state, action, reward, nextstate, done) - have functionality which is only available using methods inherited from the algorithms base classes (in our case OffPolicyAlgorithm[2]), whose methods require a openAI gym environment to run since the built-in collect_rollouts() method uses the environment to train on the spot, which we do not need to do since our pipeline needs to be able to use data from real world training as well, which will already be available during training.

The solution

In order to make SAC work in a real training environment, we need to make an altered version of the base algorithm class which does not use the environment during the collect_rollouts() method but instead utilizes the database using SQL queries and uses the _store_transitions() method in order to succesfully access and write to the replay buffer. This will be followed upon in a future user story, in which we apply a Test Driven Development[3] approach to solving the problem.

Sources

[1] Stable-Baselines3 Docs - Reliable Reinforcement Learning Implementations — Stable Baselines3 1.4.1a0 documentation. (n.d.). Stable Baselines3. Retrieved January 24, 2022, from https://stable-baselines3.readthedocs.io/en/master/index.html

[2] Stable-Baselines3 Docs - Reliable Reinforcement Learning Implementations — Stable Baselines3 1.4.1a0 documentation. (n.d.). Stable Baselines3. Retrieved January 24, 2022, from off_policy_algorithm.html

[3] What is Test Driven Development (TDD)? How to approach TDD. (2019, August 19). Agilest®. Retrieved January 24, 2022, from https://www.agilest.org/devops/test-driven-development/