Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning
2023 Β· Junmin Zhong, Ruofan Wu, Jennie Si
Abstract
We address the issue of estimation bias in deep reinforcement learning (DRL) by introducing solution mechanisms that include a new, twin TD-regularized actor-critic (TDR) method. It aims at reducing both over and under-estimation errors. With TDR and by combining good DRL improvements, such as distributional learning and long N-step surrogate stage reward (LNSS) method, we show that our new TDR-based actor-critic learning has enabled DRL methods to outperform their respective baselines in challenging environments in DeepMind Control Suite. Furthermore, they elevate TD3 and SAC respectively to a level of performance comparable to that of D4PG (the current SOTA), and they also improve the performance of D4PG to a new SOTA level measured by mean reward, convergence speed, learning success rate, and learning variance.
Authors
(none)
Tags
Stats
Related papers
- Double Actor-critic With TD Error-driven Regularization In Reinforcement Learning (2024)3.58
- Mitigating Estimation Bias With Representation Learning In TD Error-driven Regularization (2025)0.00
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Moderate Actor-critic Methods: Controlling Overestimation Bias Via Expectile Loss (2025)0.00
- Ader:adapting Between Exploration And Robustness For Actor-critic Methods (2021)0.00
- Pseudo-quantized Actor-critic Algorithm For Robustness To Noisy Temporal Difference Error (2026)0.00