Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients
2021 Β· Baturay Saglam, Furkan Burak Mutlu, Dogan Can Cicek, et al.
Abstract
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate th
Authors
(none)
Tags
Stats
Related papers
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Revisiting Estimation Bias In Policy Gradients For Deep Reinforcement Learning (2023)0.00
- Value Activation For Bias Alleviation: Generalized-activated Deep Double Deterministic Policy Gradients (2021)0.00
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00