Value Activation For Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients
2021 Β· Jiafei Lyu, Yu Yang, Jiangpeng Yan, et al.
Abstract
It is vital to accurately estimate the value function in Deep Reinforcement Learning (DRL) such that the agent could execute proper actions instead of suboptimal ones. However, existing actor-critic methods suffer more or less from underestimation bias or overestimation bias, which negatively affect their performance. In this paper, we reveal a simple but effective principle: proper value correction benefits bias alleviation, where we propose the generalized-activated weighting operator that uses any non-decreasing function, namely activation function, as weights for better value estimation. Particularly, we integrate the generalized-activated weighting operator into value estimation and introduce a novel algorithm, Generalized-activated Deep Double Deterministic Policy Gradients (GD3). We theoretically show that GD3 is capable of alleviating the potential estimation bias. We interestingly find that simple activation functions lead to satisfying performance with no additional tricks, a
Authors
(none)
Tags
Stats
Related papers
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Softmax Deep Double Deterministic Policy Gradients (2020)0.00
- Mitigating Estimation Bias With Representation Learning In TD Error-driven Regularization (2025)0.00
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Value Improved Actor Critic Algorithms (2024)0.00
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00