Double Actor-critic With TD Error-driven Regularization In Reinforcement Learning
2024 Β· Haohui Chen, Zhiyong Chen, Aoxiang Liu, et al.
Abstract
To obtain better value estimation in reinforcement learning, we propose a novel algorithm based on the double actor-critic framework with temporal difference error-driven regularization, abbreviated as TDDR. TDDR employs double actors, with each actor paired with a critic, thereby fully leveraging the advantages of double critics. Additionally, TDDR introduces an innovative critic regularization architecture. Compared to classical deterministic policy gradient-based algorithms that lack a double actor-critic structure, TDDR provides superior estimation. Moreover, unlike existing algorithms with double actor-critic frameworks, TDDR does not introduce any additional hyperparameters, significantly simplifying the design and implementation process. Experiments demonstrate that TDDR exhibits strong competitiveness compared to benchmark algorithms in challenging continuous control tasks.
Authors
(none)
Tags
Stats
Related papers
- Mitigating Estimation Bias With Representation Learning In TD Error-driven Regularization (2025)0.00
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00
- Pseudo-quantized Actor-critic Algorithm For Robustness To Noisy Temporal Difference Error (2026)0.00
- Ader:adapting Between Exploration And Robustness For Actor-critic Methods (2021)0.00
- Softmax Deep Double Deterministic Policy Gradients (2020)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60