API 1.0.3.2. AuxiliaryEstimates - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
AuxiliaryEstimates
(Utility Class)
The AuxiliaryEstimates
class provides empirical estimates over offline data used for model-based tabular OffPE methods. Implement those by using ModelBasedTabularOffPE
.
🏗️ Constructor
def __init__(self, dataset, n_obs, n_act, path=None, verbosity=0):
Args:
dataset
(pd.DataFrame): Dataset with columns:obs_init
(int or NDArray[float]): Initial observation (state) $s_0$.obs
(int or NDArray[float]): Current observation (state) $s$.act
(int): Action $a$.rew
(float): Reward $R(s, a, s')$.obs_next
(int or NDArray[float]): Next observation (state) $s'$.probs_next_evaluation
orprobs_next
(NDArray[float]): Action probabilities under the target policy at the next state $\pi(\cdot \mid s')$.
n_obs
(int): Number of states $|S|$.n_act
(int): Number of actions $|A|$.path
(str, optional): Path to load/save estimates.verbosity
(int): Verbosity level for logging.
Estimates are either loaded from disk or computed from the dataset.
📦 Properties
@property
def bar(self):
Returns:
estimates_unnormalized
(tuple): tuple of (unnormalized) empirical estimates:d0_bar
(np.ndarray): Empirical initial visitation counts $\bar d^\pi_0$.dD_bar
(np.ndarray): Empirical dataset visitation counts $\bar d^D$.P_bar
(np.ndarray): Empirical transition counts $\bar P^\pi$.r_bar
(np.ndarray): Empirical reward totals $\bar r$.n
(int): Number of samples $n$.
@property
def hat(self):
Returns:
estimates_normalized
(tuple): tuple of (normalized) empirical estimates:d0_hat
(np.ndarray): Empirical initial distribution $\hat d^\pi_0$.dD_hat
(np.ndarray): Empirical dataset distribution $\hat d^D$.P_hat
(np.ndarray): Empirical transition matrix $\hat P^\pi$.r_hat
(np.ndarray): Empirical expected reward function $\hat r$.
For more information visit the wiki page on Bellman Equations.
⚙️ Utility
def load(self):
Loads bar
estimates from precomputed .npy
files at path
.
def save(self):
Saves current bar
estimates to .npy
files at path
.
def create(self):
Generates bar
estimates from the dataset by iterating over all rows and applying counting/statistical rules.