Policy evaluation - Atcold/pytorch-PPUU GitHub Wiki

Goal

We would like to compare different policies performance.

Definitions

  • Experiment: English description of what we're trying to do with a given policy
  • Seed: each experiment is run multiple times such that we can capture the variability of the results
  • Checkpoint: policies are saved at different points in time, and their performance change noticeably across saved instances
  • Episode: each evaluation consists of computing the success rate across 560 episodes (I-80 test set)

Case study

3 experiments:

  1. Stochastic policy -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-gauss-model=vae-zdropout=0.5-policy-gauss-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
  2. Deterministic policy, regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v12/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=1
  3. Non-regressed cost -> /misc/vlgscratch4/LecunGroup/nvidia-collab/models_v13/policy_networks/MPUR-policy-deterministic-model=vae-zdropout=0.5-nfeature=256-bsize=6-npred=30-ureg=0.05-lambdal=0.2-lambdaa=0.0-gamma=0.99-lrtz=0.0-updatez=0-inferz=0-learnedcost=False

Jupyter resources