Policy evaluation - Atcold/pytorch-PPUU Wiki

Goal

We would like to compare different policies performance.

Definitions

Case study

3 experiments:

  1. Stochastic policy -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-gauss-model=vae-zdropout=0.5-policy-gauss-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
  2. Deterministic policy, regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
  3. Non-regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v13/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=False

Jupyter resources