API 1.0.3.2. AuxiliaryEstimates - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki

`AuxiliaryEstimates` (Utility Class)

The AuxiliaryEstimates class provides empirical estimates over offline data used for model-based tabular OffPE methods. Implement those by using ModelBasedTabularOffPE.

🏗️ Constructor

def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0):

Args:

dataset (pd.DataFrame): Dataset with columns:
- obs_init (int or NDArray[float]): Initial observation (state) $s_0$.
- obs (int or NDArray[float]): Current observation (state) $s$.
- act (int): Action $a$.
- rew (float): Reward $R(s, a, s')$.
- obs_next (int or NDArray[float]): Next observation (state) $s'$.
- probs_next_evaluation or probs_next (NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
n_obs (int): Number of states $|S|$.
n_act (int): Number of actions $|A|$.
path (str, optional): Path to load/save estimates.
verbosity (int): Verbosity level for logging.

Estimates are either loaded from disk or computed from the dataset.

📦 Properties

@property
def bar(self):

Returns:

estimates_unnormalized (tuple): tuple of (unnormalized) empirical estimates:
- d0_bar (np.ndarray): Empirical initial visitation counts $\bar d^\pi_0$.
- dD_bar (np.ndarray): Empirical dataset visitation counts $\bar d^D$.
- P_bar (np.ndarray): Empirical transition counts $\bar P^\pi$.
- r_bar (np.ndarray): Empirical reward totals $\bar r$.
- n (int): Number of samples $n$.

@property
def hat(self):

Returns:

estimates_normalized (tuple): tuple of (normalized) empirical estimates:
- d0_hat (np.ndarray): Empirical initial distribution $\hat d^\pi_0$.
- dD_hat (np.ndarray): Empirical dataset distribution $\hat d^D$.
- P_hat (np.ndarray): Empirical transition matrix $\hat P^\pi$.
- r_hat (np.ndarray): Empirical expected reward function $\hat r$.

For more information visit the wiki page on Bellman Equations.

⚙️ Utility

def load(self):

Loads bar estimates from precomputed .npy files at path.

def save(self):

Saves current bar estimates to .npy files at path.

def create(self):