API 1.0.3.1. ModelBasedTabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

ModelBasedTabularOffPE (Abstract Base Class)

The ModelBasedTabularOffPE class extends TabularOffPE using a model-based approach. It relies on empirical estimates of initial distributions, transition dynamics, and rewards to perform off-policy evaluation in tabular settings, by estimating

  • d0 the initial state-action distribution $d_0^\pi$,
  • dD the state-action dataset distribution $d^D$,
  • P the transition matrix $P^\pi$, and
  • r the expected rewards $r$.

๐Ÿ—๏ธ Constructor

def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0, auxiliary_estimates=None):

Args:

  • dataset (pd.DataFrame): Dataset with columns:
    • obs_init (int or NDArray[float]): Initial observation (state) $s_0$.
    • obs (int or NDArray[float]): Current observation (state) $s$.
    • act (int): Action $a$.
    • rew (float): Reward $R(s, a, s')$.
    • obs_next (int or NDArray[float]): Next observation (state) $s'$.
    • probs_next_evaluation or probs_next (NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
  • n_obs (int): Number of states $|S|$.
  • n_act (int): Number of actions $|A|$.
  • path (str, optional): Path to load/save estimates.
  • verbosity (int): Verbosity level for logging.
  • auxiliary_estimates (AuxiliaryEstimates, optional): Precomputed estimates.

If the auxiliary_estimates are not provided, a new instance of AuxiliaryEstimates will be instanciated.

๐Ÿงช Example Usage

from dice_rl_TU_Vienna.estimators.tabular.model_based import ModelBasedTabularOffPE

mb_solver = ModelBasedTabularOffPE(dataset, n_obs=20, n_act=5, path="./cache", verbosity=1)

# Access precomputed estimates
aux = mb_solver.auxiliary_estimates
d0_bar, dD_bar, P_bar, r_bar, n = aux.bar
d0_hat, dD_hat, P_hat, r_hat = aux.hat