The Exploration-exploitation Dilemma Revisited: An Entropy Perspective
2024 Β· Renye Yan, Yaozhong Gan, You Wu, et al.
Abstract
The imbalance of exploration and exploitation has long been a significant challenge in reinforcement learning. In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima. This paper revisits the exploration-exploitation dilemma from the perspective of entropy by revealing the relationship between entropy and the dynamic adaptive process of exploration and exploitation. Based on this theoretical insight, we establish an end-to-end adaptive framework called AdaZero, which automatically determines whether to explore or to exploit as well as their balance of strength. Experiments show that AdaZero significantly outperforms baseline models across various Atari and MuJoCo environments with only a single setting. Especially in the challenging environment of Montezuma, AdaZero boosts the final returns by up to fifteen times. Moreover, we conduct a series of visualization analyses to reveal the d
Authors
(none)
Tags
Stats
Related papers
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Exploitation Is All You Need... For Exploration (2025)0.00
- MULEX: Disentangling Exploitation From Exploration In Deep RL (2019)0.00
- Maximum Entropy Exploration Without The Rollouts (2026)0.00
- Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck Of Reinforcement Learning (2025)0.00
- Exploration Versus Exploitation In Reinforcement Learning: A Stochastic Control Approach (2018)9.76
- Maximum Entropy Diverse Exploration: Disentangling Maximum Entropy Reinforcement Learning (2019)0.00
- Agentic Entropy-balanced Policy Optimization (2025)0.00