Background: Assumptions - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
Assumption (MDP ergodicity). A finite MDP is called ergodic for a policy if:
- Irreducibility: Starting from any state-action pair, it is possible to reach any other state-action pair within a finite number of steps with non-zero probability.
- Aperiodicity: For every state-action pair the greatest common divisor of all return times to this state-action pair is one.
- Positive Recurrence: The expected return time to any state-action pair is finite.
Assumption (behavior policy coverage).
The evaluation policy
Assumption (dataset coverage).
The stationary distribution