Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past
2019 Β· Che Wang, Keith Ross
Abstract
Soft Actor-Critic (SAC) is an off-policy actor-critic deep reinforcement learning (DRL) algorithm based on maximum entropy reinforcement learning. By combining off-policy updates with an actor-critic formulation, SAC achieves state-of-the-art performance on a range of continuous-action benchmark tasks, outperforming prior on-policy and off-policy methods. The off-policy method employed by SAC samples data uniformly from past experience when performing parameter updates. We propose Emphasizing Recent Experience (ERE), a simple but powerful off-policy sampling technique, which emphasizes recently observed data while not forgetting the past. The ERE algorithm samples more aggressively from recent experience, and also orders the updates to ensure that updates from old data do not overwrite updates from new data. We compare vanilla SAC and SAC+ERE, and show that ERE is more sample efficient than vanilla SAC for continuous-action Mujoco tasks. We also consider combining SAC with Priority Exp
Authors
(none)
Tags
Stats
Related papers
- Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience (2021)0.00
- Context-based Soft Actor Critic For Environments With Non-stationary Dynamics (2021)0.00
- Revisiting Discrete Soft Actor-critic (2022)0.00
- SARC: Soft Actor Retrospective Critic (2023)0.00
- Learning From Demonstrations With SACR2: Soft Actor-critic With Reward Relabeling (2021)0.00
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00