Home - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
Home
Welcome to the dice_rl_TU_Vienna
wiki! 🥳 Here we provide documentation for our policy evaluation API.
⚠️ A Technical Remark
Note that the policy value is defined as
$$ \rho^\pi \doteq (1 - \gamma) \mathcal E_\pi \left [ \sum_{t=0}^\infty \gamma^t r_t \right ] \quad \text{for } 0 < \gamma < 1 \quad \text{and} \quad \rho^\pi \doteq \lim_{H \to \infty} \mathcal E_\pi \left [ \frac{1}{H+1} \sum_{t=0}^H r_t \right ] \quad \text{for } \gamma = 1. $$
Here, $\gamma$ is the discount factor, $\pi$ is the evaluation policy, and $r_t$ is the reward at time $t$.
DICE-based methods are designed for infinite-horizon settings. If your environment terminates after a finite horizon, consider looping it or modeling termination with absorbing states to better reflect infinite-horizon assumptions.
Before using the library in depth, we strongly recommend reading the documentation carefully — especially the Background section — to understand key assumptions and concepts. You may also benefit from reviewing the example project linked below for a concrete application.
📚 Documentation Overview
Jump directly to:
- Background — Key assumptions, estimators, and Bellman equations.
- Dataset and Policies — Required dataset structure and policy representation.
- Hyperparameters — Configuration details for DICE estimators.
- Algorithms — List of implemented algorithms and their expected input formats.
🔬 Application Example
For a practical application of these estimators in the healthcare domain, see our related repository:
👉 dice_rl_sepsis
— Code and experiments for the publication Evaluating Reinforcement-Learning-based Sepsis Treatments via Tabular and Continuous Stationary Distribution Correction Estimation.