Striving For Simplicity And Performance In Off-policy DRL: Output Normalization And Non-uniform Sampling
2019 Β· Che Wang, Yanqiu Wu, Quan Vuong, et al.
Abstract
We aim to develop off-policy DRL algorithms that not only exceed state-of-the-art performance but are also simple and minimalistic. For standard continuous control benchmarks, Soft Actor-Critic (SAC), which employs entropy maximization, currently provides state-of-the-art performance. We first demonstrate that the entropy term in SAC addresses action saturation due to the bounded nature of the action spaces, with this insight, we propose a streamlined algorithm with a simple normalization scheme or with inverted gradients. We show that both approaches can match SAC's sample efficiency performance without the need of entropy maximization, we then propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. Extensive experimental results demonstrate that our proposed sampling scheme leads to state of the art sample efficiency on challenging continuous control tasks. We combine all of our findings into one simple algorithm, which we call S
Authors
(none)
Tags
Stats
Related papers
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Discrete And Continuous Action Representation For Practical RL In Video Games (2019)0.00
- DDPG++: Striving For Simplicity In Continuous-control Off-policy Reinforcement Learning (2020)0.00
- Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience (2021)0.00
- Revisiting Discrete Soft Actor-critic (2022)0.00
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past (2019)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00