API 1.0.3.1. ModelBasedTabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
The ModelBasedTabularOffPE
class extends TabularOffPE
using a model-based approach. It relies on empirical estimates of initial distributions, transition dynamics, and rewards to perform off-policy evaluation in tabular settings, by estimating
-
d0
the initial state-action distribution$d_0^\pi$ , -
dD
the state-action dataset distribution$d^D$ , -
P
the transition matrix$P^\pi$ , and -
r
the expected rewards$r$ .
def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0, auxiliary_estimates=None):
Args:
-
dataset
(pd.DataFrame): Dataset with columns:-
obs_init
(int or NDArray[float]): Initial observation (state)$s_0$ . -
obs
(int or NDArray[float]): Current observation (state)$s$ . -
act
(int): Action$a$ . -
rew
(float): Reward$R(s, a, s')$ . -
obs_next
(int or NDArray[float]): Next observation (state)$s'$ . -
probs_next_evaluation
orprobs_next
(NDArray[float]): Action probabilities under the target policy at the next state$\pi(\cdot \mid s')$ .
-
-
n_obs
(int): Number of states$|S|$ . -
n_act
(int): Number of actions$|A|$ . -
path
(str, optional): Path to load/save estimates. -
verbosity
(int): Verbosity level for logging. -
auxiliary_estimates
(AuxiliaryEstimates, optional): Precomputed estimates.
If the auxiliary_estimates
are not provided, a new instance of AuxiliaryEstimates
will be instanciated.
from dice_rl_TU_Vienna.estimators.tabular.model_based import ModelBasedTabularOffPE
mb_solver = ModelBasedTabularOffPE(dataset, n_obs=20, n_act=5, path="./cache", verbosity=1)
# Access precomputed estimates
aux = mb_solver.auxiliary_estimates
d0_bar, dD_bar, P_bar, r_bar, n = aux.bar
d0_hat, dD_hat, P_hat, r_hat = aux.hat