API 1.0.3.1. ModelBasedTabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
ModelBasedTabularOffPE
(Abstract Base Class)
The ModelBasedTabularOffPE
class extends TabularOffPE
using a model-based approach. It relies on empirical estimates of initial distributions, transition dynamics, and rewards to perform off-policy evaluation in tabular settings, by estimating
d0
the initial state-action distribution $d_0^\pi$,dD
the state-action dataset distribution $d^D$,P
the transition matrix $P^\pi$, andr
the expected rewards $r$.
๐๏ธ Constructor
def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0, auxiliary_estimates=None):
Args:
dataset
(pd.DataFrame): Dataset with columns:obs_init
(int or NDArray[float]): Initial observation (state) $s_0$.obs
(int or NDArray[float]): Current observation (state) $s$.act
(int): Action $a$.rew
(float): Reward $R(s, a, s')$.obs_next
(int or NDArray[float]): Next observation (state) $s'$.probs_next_evaluation
orprobs_next
(NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
n_obs
(int): Number of states $|S|$.n_act
(int): Number of actions $|A|$.path
(str, optional): Path to load/save estimates.verbosity
(int): Verbosity level for logging.auxiliary_estimates
(AuxiliaryEstimates, optional): Precomputed estimates.
If the auxiliary_estimates
are not provided, a new instance of AuxiliaryEstimates
will be instanciated.
๐งช Example Usage
from dice_rl_TU_Vienna.estimators.tabular.model_based import ModelBasedTabularOffPE
mb_solver = ModelBasedTabularOffPE(dataset, n_obs=20, n_act=5, path="./cache", verbosity=1)
# Access precomputed estimates
aux = mb_solver.auxiliary_estimates
d0_bar, dD_bar, P_bar, r_bar, n = aux.bar
d0_hat, dD_hat, P_hat, r_hat = aux.hat