Abstract

We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, *\(\{\epsilon*\{t\}\)-greedy\}, which generates exploratory options for exploring less-visited states. We prove that search using \(\epsilon t\)-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, *GDRB*, and implement *longest n-step returns*. The resulting algorithm, *ETGL-DDPG*, integrates all three techniques: \bm\{\(\epsilon t\)\}-greedy, \textbf\{G\}DRB, and \textbf\{L\}ongest \(n\)-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the perf

Authors

(none)

Tags

  • Policy Gradient
  • Exploration

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyfutuhi2024etgl

Related papers