Glossary - KunjShah01/RL-A2A GitHub Wiki

Glossary

Actor: The component of an RL algorithm responsible for selecting actions.

Advantage: The difference between the expected return of taking an action and the average expected return.

Critic: Estimates the value function, used to evaluate the quality of actions taken by the actor.

Environment: The simulation or real-world system with which the RL agent interacts.

Episode: A single run from environment reset to terminal state.

Policy: A function or model that maps states to actions.

Replay Buffer: Storage for past experiences, used in some RL algorithms.

Reward: The feedback signal received from the environment.

Value Function: Predicts expected future rewards from a given state.

A2A: Actor-to-Actor, a variant in the Actor-Critic family allowing multiple actors.

A2C: Advantage Actor-Critic algorithm.

PPO: Proximal Policy Optimization algorithm.

See Algorithms Explained for more details.