API 1.0.2.1. TabularOffPE - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

TabularOffPE (Abstract Base Class)

The TabularOffPE class defines a unified interface for tabular off-policy evaluation (OffPE) algorithms. It includes utilities for handling observation-action indexing and dataset integration. It is meant to be extended by specific tabular OffPE estimators.

๐Ÿ—๏ธ Constructor

def __init__(self, dataset, n_obs, n_act):

Args:

  • dataset (pd.DataFrame): See Dataset and Policies.
  • n_obs (int): Number of states $|S|$.
  • n_act (int): Number of actions $|A|$.

Initializes an Indexer instance as self.indexer to map (obs, act) pairs to flattened indices.

๐Ÿ“ฆ Properties

@property
@abstractmethod
def __name__(self):

Should return the name of the specific estimator class.

@property
def n_obs(self):

Number of discrete observations (states) $|S|$.

@property
def n_act(self):

Number of discrete actions $|A|$.

@property
def dimension(self):

Total size of the tabular space $|S \times A|$.

๐Ÿš€ Solve

def solve(self, gamma, **kwargs):

To be implemented by subclasses.

Args:

  • gamma (float): Discount factor $\gamma$.
  • **kwargs: Algorithm-specific parameters.

Returns:

  • pv_hat (float): Estimated policy value $\hat \rho^\pi$.
  • info_dict (dict): Dictionary of diagnostic or intermediate results.

โš™๏ธ Utility

def get_index(self, obs, act):

Calls self.indexer.get_index.

๐Ÿงช Example

class MyEstimator(TabularOffPE):
    @property
    def __name__(self):
        return "my_estimator"

    def solve(self, gamma, **kwargs):
        # Your estimation logic here
        return pv_hat, info
โš ๏ธ **GitHub.com Fallback** โš ๏ธ