Home - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

Home

Welcome to the dice_rl_TU_Vienna wiki! 🥳 Here we provide documentation for our policy evaluation API.

⚠️ A Technical Remark

Note that the policy value is defined as

$$ \rho^\pi \doteq (1 - \gamma) \mathcal E_\pi \left [ \sum_{t=0}^\infty \gamma^t r_t \right ] \quad \text{for } 0 < \gamma < 1 \quad \text{and} \quad \rho^\pi \doteq \lim_{H \to \infty} \mathcal E_\pi \left [ \frac{1}{H+1} \sum_{t=0}^H r_t \right ] \quad \text{for } \gamma = 1. $$

Here, $\gamma$ is the discount factor, $\pi$ is the evaluation policy, and $r_t$ is the reward at time $t$.

DICE-based methods are designed for infinite-horizon settings. If your environment terminates after a finite horizon, consider looping it or modeling termination with absorbing states to better reflect infinite-horizon assumptions.

Before using the library in depth, we strongly recommend reading the documentation carefully — especially the Background section — to understand key assumptions and concepts. You may also benefit from reviewing the example project linked below for a concrete application.

📚 Documentation Overview

Jump directly to:

Background — Key assumptions, estimators, and Bellman equations.
Dataset and Policies — Required dataset structure and policy representation.
Hyperparameters — Configuration details for DICE estimators.
Algorithms — List of implemented algorithms and their expected input formats.

🔬 Application Example

For a practical application of these estimators in the healthcare domain, see our related repository:

👉 dice_rl_sepsis — Code and experiments for the publication Evaluating Reinforcement-Learning-based Sepsis Treatments via Tabular and Continuous Stationary Distribution Correction Estimation.