Quantum Reinforcement Learning QRL - RPIQuantumComputing/QuantumCircuits GitHub Wiki

Quantum Reinforcement learning is a field that combines the principles of reinforcement learning with quantum computing. Reinforcement learning is a branch of machine learning where an agent learns to make sequential decisions in an environment to maximize a reward signal. The goal that QRL tries to leverage is to utilize the unique properties of quantum systems, such as superposition and entanglement, to enhance the efficiency and effectiveness of the reinforcement learning process, basically, the key idea is to use quantum algorithms and quantum computing devices to improve the performance of RL algorithms in solving complex problems.

There are several Reinforcement algorithms, some of the most prominent ones are:

Q-Learning: A widely used off-policy RL algorithm. It learns an action-value function, called the Q-function, that estimates the expected return from taking a specific action in a given state. Q-Learning uses an iterative update rule to improve the Q-values based on the observed rewards and the maximum Q-value of the next state.
Deep Q-Networks (DQN): A variant of Q-Learning that uses deep neural networks to approximate the Q-function, basically it combines Q-Learning with deep learning techniques to handle high-dimensional state spaces, and DQN also introduces experience replay and a target network to stabilize the learning process.
SARSA: An on-policy RL algorithm that stands for State Action Reward State Action., it is designed similar to Q-Learning, SARSA estimates the action-value function as well, but it updates the Q-values based on the current action as well as the next action taken in the environment.
Actor-Critic Methods: This algorithm combines elements of both value-based and policy-based methods, and they maintain both a policy network, which is called the actor. and a value function network, the critic. The actor selects actions based on the policy, while the critic estimates the value of the current state, some examples of the Actor-Critic methods includes Advantage Actor-Critic (A2C) and Asynchronous Advantage Actor-Critic (A3C).
Monte Carlo Methods: This algorithm estimates the value function or policy by averaging the returns calculated from the entire sequences of interactions an agent has with the environment from the initial state to a terminal state.

There are also several quantum reinforcement learning algorithms, here is some prominent ones:

Quantum Approximate Policy Iteration (QAPI): A QRL algorithm that combines classical RL techniques with quantum computing, it uses a quantum circuit to represent the policy and employs classical optimization methods to update the policy parameters based on the estimated values of the quantum policy.
Quantum Advantage Actor-Critic (QAAC): A QRL algorithm that is based off of the actor-critic framework, it uses a quantum circuit to represent the actor and estimates the critic using quantum techniques, and the two are updated iteratively to improve the policy and value estimates.
Quantum Advantage SARSA (QASARSA): A QRL algorithm based off of the SARSA algorithm, an extension into quantum mechanics, it uses quantum circuits to represent the policy and estimates the action-value function using quantum methods, QASARSA also updates the policy and action-value estimates based on the observed rewards and the next quantum state-action pair.
Quantum Monte Carlo Tree Search (QMCTS): A QRL algorithm that combines quantum mechanics with Monte Carlo Tree Search as well as reinforcement learning. It uses quantum techniques to simulate and evaluate potential actions in a tree-like search structure, enabling more efficient exploration and exploitation of the action space or environment.
Quantum Deep Q-Network (QDQN): A QRL algorithm that extends the Deep Q-Network (DQN) algorithm into quantum mechanics, by using quantum circuits to approximate the Q-function and leverages quantum algorithms for efficient optimization and exploration.