API 1.0.2.1. TabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
The TabularOffPE
class defines a unified interface for tabular off-policy evaluation (OffPE) algorithms. It includes utilities for handling observation-action indexing and dataset integration. It is meant to be extended by specific tabular OffPE estimators.
def __init__(self, dataset, n_obs, n_act):
Args:
-
dataset
(pd.DataFrame): See Dataset and Policies. -
n_obs
(int): Number of states$|S|$ . -
n_act
(int): Number of actions$|A|$ .
Initializes an Indexer
instance as self.indexer
to map (obs, act)
pairs to flattened indices.
@property
@abstractmethod
def __name__(self):
Should return the name of the specific estimator class.
@property
def n_obs(self):
Number of discrete observations (states)
@property
def n_act(self):
Number of discrete actions
@property
def dimension(self):
Total size of the tabular space
def solve(self, gamma, **kwargs):
To be implemented by subclasses.
Args:
-
gamma
(float): Discount factor$\gamma$ . -
**kwargs
: Algorithm-specific parameters.
Returns:
-
pv_hat
(float): Estimated policy value$\hat \rho^\pi$ . -
info_dict
(dict): Dictionary of diagnostic or intermediate results.
def get_index(self, obs, act):
Calls self.indexer.get_index
.
class MyEstimator(TabularOffPE):
@property
def __name__(self):
return "my_estimator"
def solve(self, gamma, **kwargs):
# Your estimation logic here
return pv_hat, info