VDSC: Enhancing Exploration Timing With Value Discrepancy And State Counts
2024 Β· Marius Captari, Remo Sasso, Matthia Sabatelli
Abstract
Despite the considerable attention given to the questions of \textit\{how much\} and \textit\{how to\} explore in deep reinforcement learning, the investigation into \textit\{when\} to explore remains relatively less researched. While more sophisticated exploration strategies can excel in specific, often sparse reward environments, existing simpler approaches, such as \(\epsilon\)-greedy, persist in outperforming them across a broader spectrum of domains. The appeal of these simpler strategies lies in their ease of implementation and generality across a wide range of domains. The downside is that these methods are essentially a blind switching mechanism, which completely disregards the agent's internal state. In this paper, we propose to leverage the agent's internal state to decide \textit\{when\} to explore, addressing the shortcomings of blind switching mechanisms. We present Value Discrepancy and State Counts through homeostasis (VDSC), a novel approach for efficient exploration ti
Authors
(none)
Tags
Stats
Related papers
- Accelerating Reinforcement Learning With Value-conditional State Entropy Exploration (2023)0.00
- SVDE: Scalable Value-decomposition Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Temporal Difference Uncertainties As A Signal For Exploration (2020)0.00
- Rewarding Episodic Visitation Discrepancy For Exploration In Reinforcement Learning (2022)0.00
- Long-term Visitation Value For Deep Exploration In Sparse Reward Reinforcement Learning (2020)7.24
- Neighboring State-based Exploration For Reinforcement Learning (2022)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Exploration In Feature Space For Reinforcement Learning (2017)0.00