API 2.1.3. NeuralGradientDice - Reinforcement-Learning-TU-Vienna/dice_rl_TU_Vienna GitHub Wiki
NeuralGradientDice
estimates the policy value NeuralGenDice
, but overrides the method get_loss
.
Just like NeuralGenDice
, NeuralGradientDice
supports both the discounted and undiscounted case, i.e.,
Fenchel-Rockefeller duality is applied to the primal GradientDICE objective from TabularGradientDice
, to yield the dual GradientDICE objective:
The dual objective in GradientDICE is the same as in GenDICE, except for a slight modification in the last term of the loss function:
The loss term
This also goes for the norm regularization term
For further details, refer to the original paper: GradientDICE: Efficient Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections
def __init__(
self,
gamma, lamda, seed, batch_size,
learning_rate, hidden_dimensions,
obs_min, obs_max, n_act, obs_shape,
dataset, preprocess_obs=None, preprocess_act=None, preprocess_rew=None,
dir=None, get_recordings=None, other_hyperparameters=None, save_interval=100):
Args:
- All the arguments of
NeuralGenDice
are inherited.
def get_loss(self, v_init, v, v_next, w):
Overrides the base class get_loss
to compute the dual GradientDICE objective.
from some_module import NeuralGradientDice
estimator = NeuralGradientDice(
gamma=0.99,
lamda=0.5,
seed=0,
batch_size=64,
learning_rate=1e-3,
hidden_dimensions=(64, 64),
obs_min=obs_min,
obs_max=obs_max,
n_act=4,
obs_shape=(8,),
dataset=df,
dir="./logs"
)
estimator.evaluate_loop(n_steps=10_000)
rho_hat = estimator.solve_pv(weighted=True)