Moderate Actor-critic Methods: Controlling Overestimation Bias Via Expectile Loss
2025 Β· Ukjo Hwang, Songnam Hong
Abstract
Overestimation is a fundamental characteristic of model-free reinforcement learning (MF-RL), arising from the principles of temporal difference learning and the approximation of the Q-function. To address this challenge, we propose a novel moderate target in the Q-function update, formulated as a convex optimization of an overestimated Q-function and its lower bound. Our primary contribution lies in the efficient estimation of this lower bound through the lower expectile of the Q-value distribution conditioned on a state. Notably, our moderate target integrates seamlessly into state-of-the-art (SOTA) MF-RL algorithms, including Deep Deterministic Policy Gradient (DDPG) and Soft Actor Critic (SAC). Experimental results validate the effectiveness of our moderate target in mitigating overestimation bias in DDPG, SAC, and distributional RL algorithms.
Authors
(none)
Tags
Stats
Related papers
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00
- Mitigating Estimation Bias With Representation Learning In TD Error-driven Regularization (2025)0.00
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- Stochastic Actor-critic: Mitigating Overestimation Via Temporal Aleatoric Uncertainty (2026)0.00
- Adaptively Calibrated Critic Estimates For Deep Reinforcement Learning (2021)7.16