API 1.0.2.1. TabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

`TabularOffPE` (Abstract Base Class)

The TabularOffPE class defines a unified interface for tabular off-policy evaluation (OffPE) algorithms. It includes utilities for handling observation-action indexing and dataset integration. It is meant to be extended by specific tabular OffPE estimators.

🏗️ Constructor

def __init__(self, dataset, n_obs, n_act):

Args:

dataset (pd.DataFrame): See Dataset and Policies.
n_obs (int): Number of states $|S|$.
n_act (int): Number of actions $|A|$.

Initializes an Indexer instance as self.indexer to map (obs, act) pairs to flattened indices.

📦 Properties

@property
@abstractmethod
def __name__(self):

Should return the name of the specific estimator class.

@property
def n_obs(self):

Number of discrete observations (states) $|S|$.

@property
def n_act(self):

Number of discrete actions $|A|$.

@property
def dimension(self):

Total size of the tabular space $|S \times A|$.

🚀 Solve

def solve(self, gamma, **kwargs):

To be implemented by subclasses.

Args:

gamma (float): Discount factor $\gamma$.
**kwargs: Algorithm-specific parameters.

Returns:

pv_hat (float): Estimated policy value $\hat \rho^\pi$.
info_dict (dict): Dictionary of diagnostic or intermediate results.

⚙️ Utility

def get_index(self, obs, act):

Calls self.indexer.get_index.

🧪 Example

class MyEstimator(TabularOffPE):
    @property
    def __name__(self):
        return "my_estimator"

    def solve(self, gamma, **kwargs):
        # Your estimation logic here
        return pv_hat, info