Adaptively Calibrated Critic Estimates For Deep Reinforcement Learning
2021 Β· Nicolai Dorka, Tim Welschehold, Joschka Boedecker, et al.
Abstract
Accurate value estimates are important for off-policy reinforcement learning. Algorithms based on temporal difference learning typically are prone to an over- or underestimation bias building up over time. In this paper, we propose a general method called Adaptively Calibrated Critics (ACC) that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We apply ACC to Truncated Quantile Critics, which is an algorithm for continuous control that allows regulation of the bias with a hyperparameter tuned per environment. The resulting algorithm adaptively adjusts the parameter during training rendering hyperparameter search unnecessary and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment. ACC further achieves improved results on different tasks from the Meta-World robot benchmark. Additionally, we demonstrate t
Authors
(none)
Tags
Stats
Related papers
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00
- Stochastic Actor-critic: Mitigating Overestimation Via Temporal Aleatoric Uncertainty (2026)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Adviser-actor-critic: Eliminating Steady-state Error In Reinforcement Learning Control (2025)0.00
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00