An Analysis Of Quantile Temporal-difference Learning
2023 Β· Mark Rowland, RΓ©mi Munos, Mohammad Gheshlaghi Azar, et al.
Abstract
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning. Despite these empirical successes, a theoretical understanding of QTD has proven elusive until now. Unlike classical TD learning, which can be analysed with standard stochastic approximation tools, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points. The core result of this paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1, putting QTD on firm theoretical footing. The proof establishes connections between QTD and non-linear differential inclusions through stochastic approximation theory and non-smooth analysis.
Authors
(none)
Tags
Stats
Related papers
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Target-based Temporal Difference Learning (2019)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Nonlinear Distributional Gradient Temporal-difference Learning (2018)0.00
- Accelerated Distributional Temporal Difference Learning With Linear Function Approximation (2025)0.00
- Stability And Sensitivity Analysis Of Relative Temporal-difference Learning: Extended Version (2026)0.00