WD3: Taming The Estimation Bias In Deep Reinforcement Learning
2020 Β· Qiang He, Xinwen Hou
Abstract
The overestimation phenomenon caused by function approximation is a well-known issue in value-based reinforcement learning algorithms such as deep Q-networks and DDPG, which could lead to suboptimal policies. To address this issue, TD3 takes the minimum value between a pair of critics. In this paper, we show that the TD3 algorithm introduces underestimation bias in mild assumptions. To obtain a more precise estimation for value function, we unify these two opposites and propose a novel algorithm \underline\{W\}eighted \underline\{D\}elayed \underline\{D\}eep \underline\{D\}eterministic Policy Gradient (WD3), which can eliminate the estimation bias and further improve the performance by weighting a pair of critics. To demonstrate the effectiveness of WD3, we compare the learning process of value function between DDPG, TD3, and WD3. The results verify that our algorithm does eliminate the estimation error of value functions. Furthermore, we evaluate our algorithm on the continuous contro
Authors
(none)
Tags
Stats
Related papers
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- Value Activation For Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients (2021)0.00
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Revisiting Estimation Bias In Policy Gradients For Deep Reinforcement Learning (2023)0.00
- Ader:adapting Between Exploration And Robustness For Actor-critic Methods (2021)0.00
- Mitigating Estimation Bias With Representation Learning In TD Error-driven Regularization (2025)0.00