ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm For Sparse Reward Continuous Control
2024 Β· Ehsan Futuhi, Shayan Karimi, Chao Gao, et al.
Abstract
We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, *\(\{\epsilon*\{t\}\)-greedy\}, which generates exploratory options for exploring less-visited states. We prove that search using \(\epsilon t\)-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, *GDRB*, and implement *longest n-step returns*. The resulting algorithm, *ETGL-DDPG*, integrates all three techniques: \bm\{\(\epsilon t\)\}-greedy, \textbf\{G\}DRB, and \textbf\{L\}ongest \(n\)-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the perf
Authors
(none)
Tags
Stats
Related papers
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- DDPG++: Striving For Simplicity In Continuous-control Off-policy Reinforcement Learning (2020)0.00
- Zeroth-order Deterministic Policy Gradient (2020)0.00
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Deterministic Value-policy Gradients (2019)0.00
- Expected Policy Gradients (2017)0.00
- Learning To Explore With Meta-policy Gradient (2018)0.00