Abstract

While many sophisticated exploration methods have been proposed, their lack of generality and high computational cost often lead researchers to favor simpler methods like \(\epsilon\)-greedy. Motivated by this, we introduce \(\beta\)-DQN, a simple and efficient exploration method that augments the standard DQN with a behavior function \(\beta\). This function estimates the probability that each action has been taken at each state. By leveraging \(\beta\), we generate a population of diverse policies that balance exploration between state-action coverage and overestimation bias correction. An adaptive meta-controller is designed to select an effective policy for each episode, enabling flexible and explainable exploration. \(\beta\)-DQN is straightforward to implement and adds minimal computational overhead to the standard DQN. Experiments on both simple and challenging exploration domains show that \(\beta\)-DQN outperforms existing baseline methods across a wide range of tasks, providi

Authors

(none)

Tags

  • Exploration

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyzhang2025beta

Related papers