Soft Actor Critic (SAC) - kairproject/kair_algorithms_draft GitHub Wiki

SAC (Haarnoja et al., 2018a) incorporates maximum entropy reinforcment learning, where the agent's goal is to maximize expected reward and entropy concurrently. Combined with TD3, SAC achieves state of the art performance in various continuous control tasks. SAC has been extended to allow automatically tuning of the temperature parameter (Haarnoja et al., 2018b), which determines the importance of entropy against the expected reward.

Example Script on LunarLander
ArXiv Preprint (Original SAC)
ArXiv Preprint (SAC with autotuned temperature)