Revisiting Discrete Soft Actor-critic
2022 Β· Haibin Zhou, Tong Wei, Zichuan Lin, et al.
Abstract
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.
Authors
(none)
Tags
Stats
Code
Related papers
- DSAC-C: Constrained Maximum Entropy For Robust Discrete Soft-actor Critic (2023)0.00
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81
- Discrete And Continuous Action Representation For Practical RL In Video Games (2019)0.00
- Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past (2019)0.00
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience (2021)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77