from Demonstrations (fD) - kairproject/kair_algorithms_draft GitHub Wiki

DDPGfD (Vecerik et al., 2017) is an imitation learning algorithm that infuses demonstration data into experience replay. DDPGfD also improved DDPG by (1) using prioritized experience replay (Schaul et al., 2015), (2) adding n-step returns, (3) learning multiple times per environment step, and (4) adding L2 regularizers to actor and critic losses. We incorporated these improvements to TD3 and SAC and found that it dramatically improves their performance.

Example Script of TD3fD on LunarLander
Example Script of SACfD on LunarLander
ArXiv Preprint