API 1.0.2.1. TabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
TabularOffPE
(Abstract Base Class)
The TabularOffPE
class defines a unified interface for tabular off-policy evaluation (OffPE) algorithms. It includes utilities for handling observation-action indexing and dataset integration. It is meant to be extended by specific tabular OffPE estimators.
๐๏ธ Constructor
def __init__(self, dataset, n_obs, n_act):
Args:
dataset
(pd.DataFrame): See Dataset and Policies.n_obs
(int): Number of states $|S|$.n_act
(int): Number of actions $|A|$.
Initializes an Indexer
instance as self.indexer
to map (obs, act)
pairs to flattened indices.
๐ฆ Properties
@property
@abstractmethod
def __name__(self):
Should return the name of the specific estimator class.
@property
def n_obs(self):
Number of discrete observations (states) $|S|$.
@property
def n_act(self):
Number of discrete actions $|A|$.
@property
def dimension(self):
Total size of the tabular space $|S \times A|$.
๐ Solve
def solve(self, gamma, **kwargs):
To be implemented by subclasses.
Args:
gamma
(float): Discount factor $\gamma$.**kwargs
: Algorithm-specific parameters.
Returns:
pv_hat
(float): Estimated policy value $\hat \rho^\pi$.info_dict
(dict): Dictionary of diagnostic or intermediate results.
โ๏ธ Utility
def get_index(self, obs, act):
Calls self.indexer.get_index
.
๐งช Example
class MyEstimator(TabularOffPE):
@property
def __name__(self):
return "my_estimator"
def solve(self, gamma, **kwargs):
# Your estimation logic here
return pv_hat, info