Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience
2021 Β· Chayan Banerjee, Zhiyong Chen, Nasimul Noman
Abstract
Soft Actor-Critic (SAC) is an off-policy actor-critic reinforcement learning algorithm, essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off between expected return and entropy (randomness in the policy). It has achieved state-of-the-art performance on a range of continuous-control benchmark tasks, outperforming prior on-policy and off-policy methods. SAC works in an off-policy fashion where data are sampled uniformly from past experiences (stored in a buffer) using which parameters of the policy and value function networks are updated. We propose certain crucial modifications for boosting the performance of SAC and make it more sample efficient. In our proposed improved SAC, we firstly introduce a new prioritization scheme for selecting better samples from the experience replay buffer. Secondly we use a mixture of the prioritized off-policy data with the latest on-policy data for training the policy and the value function networks. We compare ou
Authors
(none)
Tags
Stats
Related papers
- Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past (2019)0.00
- Band-limited Soft Actor Critic Model (2020)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Revisiting Discrete Soft Actor-critic (2022)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- SARC: Soft Actor Retrospective Critic (2023)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81