API 1.0.3.1. ModelBasedTabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

`ModelBasedTabularOffPE` (Abstract Base Class)

The ModelBasedTabularOffPE class extends TabularOffPE using a model-based approach. It relies on empirical estimates of initial distributions, transition dynamics, and rewards to perform off-policy evaluation in tabular settings, by estimating

d0 the initial state-action distribution $d_0^\pi$,
dD the state-action dataset distribution $d^D$,
P the transition matrix $P^\pi$, and
r the expected rewards $r$.

🏗️ Constructor

def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0, auxiliary_estimates=None):

Args:

dataset (pd.DataFrame): Dataset with columns:
- obs_init (int or NDArray[float]): Initial observation (state) $s_0$.
- obs (int or NDArray[float]): Current observation (state) $s$.
- act (int): Action $a$.
- rew (float): Reward $R(s, a, s')$.
- obs_next (int or NDArray[float]): Next observation (state) $s'$.
- probs_next_evaluation or probs_next (NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
n_obs (int): Number of states $|S|$.
n_act (int): Number of actions $|A|$.
path (str, optional): Path to load/save estimates.
verbosity (int): Verbosity level for logging.
auxiliary_estimates (AuxiliaryEstimates, optional): Precomputed estimates.

If the auxiliary_estimates are not provided, a new instance of AuxiliaryEstimates will be instanciated.

🧪 Example Usage

from dice_rl_TU_Vienna.estimators.tabular.model_based import ModelBasedTabularOffPE

mb_solver = ModelBasedTabularOffPE(dataset, n_obs=20, n_act=5, path="./cache", verbosity=1)

# Access precomputed estimates
aux = mb_solver.auxiliary_estimates
d0_bar, dD_bar, P_bar, r_bar, n = aux.bar
d0_hat, dD_hat, P_hat, r_hat = aux.hat

API 1.0.3.1. ModelBasedTabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

ModelBasedTabularOffPE (Abstract Base Class)

🏗️ Constructor

🧪 Example Usage

`ModelBasedTabularOffPE` (Abstract Base Class)