Entropy Augmented Reinforcement Learning
2022 Β· Jianfei Ma
Abstract
Deep reinforcement learning was instigated with the presence of trust region methods, being scalable and efficient. However, the pessimism of such algorithms, among which it forces to constrain in a trust region by all means, has been proven to suppress the exploration and harm the performance. Exploratory algorithm such as SAC, while utilizes the entropy to encourage exploration, implicitly optimizing another objective yet. We first observed this inconsistency, and therefore put forward an analogous augmentation technique, which combines well with the on-policy algorithms, when a value critic is involved. Surprisingly, the proposed method consistently satisfies the soft policy improvement theorem, while being more extensible. As the analysis advises, it is crucial to control the temperature coefficient to balance the exploration and exploitation. Empirical tests on MuJoCo benchmark tasks show that the agent is heartened towards higher reward regions, and enjoys a finer performance. Fu
Authors
(none)
Tags
Stats
Related papers
- Policy Augmentation: An Exploration Strategy For Faster Convergence Of Deep Reinforcement Learning Algorithms (2021)2.26
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Experience Augmentation: Boosting And Accelerating Off-policy Multi-agent Reinforcement Learning (2020)0.00
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Exploring More When It Needs In Deep Reinforcement Learning (2021)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Entropy-augmented Entropy-regularized Reinforcement Learning And A Continuous Path From Policy Gradient To Q-learning (2020)0.00