VDSC: Enhancing Exploration Timing With Value Discrepancy And State Counts

Abstract

Despite the considerable attention given to the questions of \textit\{how much\} and \textit\{how to\} explore in deep reinforcement learning, the investigation into \textit\{when\} to explore remains relatively less researched. While more sophisticated exploration strategies can excel in specific, often sparse reward environments, existing simpler approaches, such as \(\epsilon\)-greedy, persist in outperforming them across a broader spectrum of domains. The appeal of these simpler strategies lies in their ease of implementation and generality across a wide range of domains. The downside is that these methods are essentially a blind switching mechanism, which completely disregards the agent's internal state. In this paper, we propose to leverage the agent's internal state to decide \textit\{when\} to explore, addressing the shortcomings of blind switching mechanisms. We present Value Discrepancy and State Counts through homeostasis (VDSC), a novel approach for efficient exploration ti

VDSC: Enhancing Exploration Timing With Value Discrepancy And State Counts

Abstract

Authors

Tags

Stats

Related papers