Glossary - KunjShah01/RL-A2A GitHub Wiki
Glossary
Actor: The component of an RL algorithm responsible for selecting actions.
Advantage: The difference between the expected return of taking an action and the average expected return.
Critic: Estimates the value function, used to evaluate the quality of actions taken by the actor.
Environment: The simulation or real-world system with which the RL agent interacts.
Episode: A single run from environment reset to terminal state.
Policy: A function or model that maps states to actions.
Replay Buffer: Storage for past experiences, used in some RL algorithms.
Reward: The feedback signal received from the environment.
Value Function: Predicts expected future rewards from a given state.
A2A: Actor-to-Actor, a variant in the Actor-Critic family allowing multiple actors.
A2C: Advantage Actor-Critic algorithm.
PPO: Proximal Policy Optimization algorithm.
See Algorithms Explained for more details.