Hyperparameters - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
Hyperparameters (TODO)
This page documents the hyperparameters used in the offline policy evaluation algorithms implemented in this repository. These include settings for DICE-based methods, importance sampling estimators, and value function solvers.
General Notation
- Let $\gamma \in (0, 1]$ be the discount factor.
- Let $\pi$ denote the evaluation policy.
- Let $\pi^{\mathcal D}$ denote the behavior policy.
- Let $w_{\pi / \mathcal D}^\gamma$ denote the stationary distribution correction ratio.
Optimization Hyperparameters
For DICE methods (e.g., DualDICE, GenDICE):
learning_rate
: Learning rate used by the optimizer (e.g., Adam).batch_size
: Number of samples per gradient update.num_iterations
: Total number of training iterations.gradient_clip
: Optional max-norm for gradient clipping.optimizer
: Optimizer type (usuallyadam
).
Regularization Parameters:
entropy_coeff
: Coefficient for entropy regularization (if applicable).l2_weight
: Weight for L2 regularization.dual_reg_coeff
: Coefficient used for dual variable regularization.
Network Architecture
For Neural Estimators:
hidden_sizes
: List specifying the number of hidden units per layer (e.g.,[256, 256]
).activation
: Activation function used in hidden layers (e.g.,relu
,tanh
).use_layer_norm
: Whether to apply layer normalization.
Evaluation
Estimation Settings:
num_eval_episodes
: Number of episodes used for on-policy evaluation.normalize_weights
: Whether to normalize importance weights in estimators.clip_weights
: Whether to clip importance weights and, if so, the clipping threshold.
Dataset and Logging
seed
: Random seed for reproducibility.log_interval
: How frequently to log metrics during training.eval_interval
: How frequently to evaluate the current model.save_model
: Whether to save the model checkpoints.
Notes
- Specific hyperparameters may vary depending on the algorithm or benchmark.
- See experiment scripts in the repository for actual default values and overrides.