An Entropy Regularization Free Mechanism For Policy-based Reinforcement Learning
2021 Β· Changnan Xiao, Haosen Shi, Jiajun Fan, et al.
Abstract
Policy-based reinforcement learning methods suffer from the policy collapse problem. We find valued-based reinforcement learning methods with \{\epsilon\}-greedy mechanism are capable of enjoying three characteristics, Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off, which help value-based methods avoid the policy collapse problem. However, there does not exist a parallel mechanism for policy-based methods that achieves all three characteristics. In this paper, we propose an entropy regularization free mechanism that is designed for policy-based methods, which achieves Closed-form Diversity, Objective-invariant Exploration and Adaptive Trade-off. Our experiments show that our mechanism is super sample-efficient for policy-based methods and boosts a policy-based baseline to a new State-Of-The-Art on Arcade Learning Environment.
Authors
(none)
Tags
Stats
Related papers
- Understanding The Impact Of Entropy On Policy Optimization (2018)0.00
- Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck Of Reinforcement Learning (2025)0.00
- Diversity Actor-critic: Sample-aware Entropy Regularization For Sample-efficient Exploration (2020)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Increasing Entropy To Boost Policy Gradient Performance On Personalization Tasks (2023)0.00
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- EPO: Entropy-regularized Policy Optimization For LLM Agents Reinforcement Learning (2025)0.00
- Implicit Policy For Reinforcement Learning (2018)0.00