API 1.0.3.2. AuxiliaryEstimates - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

AuxiliaryEstimates (Utility Class)

The AuxiliaryEstimates class provides empirical estimates over offline data used for model-based tabular OffPE methods. Implement those by using ModelBasedTabularOffPE.

🏗️ Constructor

def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0):

Args:

  • dataset (pd.DataFrame): Dataset with columns:
    • obs_init (int or NDArray[float]): Initial observation (state) $s_0$.
    • obs (int or NDArray[float]): Current observation (state) $s$.
    • act (int): Action $a$.
    • rew (float): Reward $R(s, a, s')$.
    • obs_next (int or NDArray[float]): Next observation (state) $s'$.
    • probs_next_evaluation or probs_next (NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
  • n_obs (int): Number of states $|S|$.
  • n_act (int): Number of actions $|A|$.
  • path (str, optional): Path to load/save estimates.
  • verbosity (int): Verbosity level for logging.

Estimates are either loaded from disk or computed from the dataset.

📦 Properties

@property
def bar(self):

Returns:

  • estimates_unnormalized (tuple): tuple of (unnormalized) empirical estimates:
    • d0_bar (np.ndarray): Empirical initial visitation counts $\bar d^\pi_0$.
    • dD_bar (np.ndarray): Empirical dataset visitation counts $\bar d^D$.
    • P_bar (np.ndarray): Empirical transition counts $\bar P^\pi$.
    • r_bar (np.ndarray): Empirical reward totals $\bar r$.
    • n (int): Number of samples $n$.
@property
def hat(self):

Returns:

  • estimates_normalized (tuple): tuple of (normalized) empirical estimates:
    • d0_hat (np.ndarray): Empirical initial distribution $\hat d^\pi_0$.
    • dD_hat (np.ndarray): Empirical dataset distribution $\hat d^D$.
    • P_hat (np.ndarray): Empirical transition matrix $\hat P^\pi$.
    • r_hat (np.ndarray): Empirical expected reward function $\hat r$.

For more information visit the wiki page on Bellman Equations.

⚙️ Utility

def load(self):

Loads bar estimates from precomputed .npy files at path.

def save(self):

Saves current bar estimates to .npy files at path.

def create(self):

Generates bar estimates from the dataset by iterating over all rows and applying counting/statistical rules.