Pseudo-quantized Actor-critic Algorithm For Robustness To Noisy Temporal Difference Error
2026 Β· Taisuke Kobayashi
Abstract
In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the TD error is defined by a bootstrap method, its computation tends to be noisy and destabilize learning. Heuristics to improve the accuracy of TD errors, such as target networks and ensemble models, have been introduced so far. While these are essential approaches for the current deep RL algorithms, they cause side effects like increased computational cost and reduced learning efficiency. Therefore, this paper revisits the TD learning algorithm based on control as inference, deriving a novel algorithm capable of robust learning against noisy TD errors. First, the distribution model of optimality, a binary random variable, is represented by a sigmoid function. Alongside forward and reverse Kullback-Leibler divergences, this new model derives a robust learning rule: when the sigmoid function saturates with a large TD error probably due to noise, t
Authors
(none)
Tags
Stats
Related papers
- Double Actor-critic With TD Error-driven Regularization In Reinforcement Learning (2024)3.58
- Discerning Temporal Difference Learning (2023)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Gradient Temporal-difference Learning With Regularized Corrections (2020)0.00
- DROP: Distributional And Regular Optimism And Pessimism For Reinforcement Learning (2024)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Adversarially-robust TD Learning With Markovian Data: Finite-time Rates And Fundamental Limits (2025)0.00