Reinforcement Learning Models - HU-ICT-LAB/RobotWars GitHub Wiki

This page contains info over the different types of models we have compared to arrive at the choise of models. First we will look at the categories of RL-models. These categories are:

Model-Based
Model-Free
2.A Q-Learning
2.B Policy Optimization

Model-Based vs Model-Free

Model-Based RL uses the information of the environment/world. After this will the model pick a action based of an planning of the world model. While the Model-Free learns its policy based on everything that is observed in the state the agents is in, without world data. An example of this like stated in the first medium article of the source list. Model based RL looks at the whole route you have taken in a pathfinding problem. While model-free RL looks at the actions taken on a certain location without looking at the whole route that you have taken.

In this project we want the robot to react to what is happing arround him, like shooting back at a other robot that shot at him. Which is possible to do with Model-Free RL. That is why we are going to look deeper into Model-Free models by looking at Q-Learning and Policy Optimization.

Q-Learning vs Policy Optimization

In Q-learning, the goal is to learn a single deterministic action from a discrete set of actions by finding the maximum value (Stackoverflow, 2018, nbro). While Policy optimization learns directly the policy function that maps state to action (medium, 2021, s. Ai). With this you can have continuous actions.

The robot in this project does not just move left or right which are discrete actions, but it moves with a continuous speed to a direction. Since these values are continuous, we are looking more in the direction of Policy Optimization (sinces there are also models on the border of these 2 categories).

Models

The models we have researched are:

Policy Gradient
Actor Critic (A2C/A3C)
Soft Actor Critic

Policy Gradient

Policy Gradient tries to maximize the expected reward of the policy using gradient decent.

Actor Critic

Actor critic exist out of 2 networks: The actor and the critic. The actor will decide the action that the agent needs to take, while the critc informs the actor if this was a good action or not.

Soft Actor Critic(SAC)

The algorithm is based on a maximum entropy RL where the objective is to find the optimal policy that maximises the expected long term reward and long term entropy (medium, 2020, D. Karunakaran).

Conclusion

Like stated previously in on this based. The type of model we need is Model-Free and mostly the Policy Optimiztion branch, because the robot react to its environment and the states are continious. From the multiple models of this branch, we have chosen the Soft-Actor-Critic model. This model is the most suited for or project because; SAC is defined for RL tasks involving continuous actions(towardsdatascience, 2019, V. Kumar). The way we work with the robot is by continuously changing the speed of the wheels and rotating parts to control the robot, which is what SAC is made for.

Sources

I. (2018, October 29). What is Model-Based Reinforcement Learning? - the integrate.ai blog. Medium. Retrieved December 16, 2021, from https://medium.com/the-official-integrate-ai-blog/understanding-reinforcement-learning-93d4e34e5698
Ai, S. (2021, December 7). Reinforcement Learning algorithms — an intuitive overview. Medium. Retrieved December 16, 2021, from https://smartlabai.medium.com/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc
What is the relation between Q-learning and policy gradients methods? (2018, April 28). Artificial Intelligence Stack Exchange. Retrieved December 16, 2021, from https://ai.stackexchange.com/questions/6196/what-is-the-relation-between-q-learning-and-policy-gradients-methods/6199
Kapoor, S. (2018, June 21). Policy Gradients in a Nutshell - Towards Data Science. Medium. Retrieved December 23, 2021, from https://towardsdatascience.com/policy-gradients-in-a-nutshell-8b72f9743c5d
Karunakaran, D. (2021, December 16). The Actor-Critic Reinforcement Learning algorithm - Intro to Artificial Intelligence. Medium. Retrieved January 10, 2022, from https://medium.com/intro-to-artificial-intelligence/the-actor-critic-reinforcement-learning-algorithm-c8095a655c14
V.Kumar, V. (2021, December 7). Soft Actor-Critic Demystified - Towards Data Science. Medium. Retrieved January 10, 2022, from https://towardsdatascience.com/soft-actor-critic-demystified-b8427df61665
Karunakaran, D. (2021b, December 22). Soft Actor-Critic Reinforcement Learning algorithm - Intro to Artificial Intelligence. Medium. Retrieved January 10, 2022, from https://medium.com/intro-to-artificial-intelligence/soft-actor-critic-reinforcement-learning-algorithm-1934a2c3087f

Related issues

Issues: #112