Uncertainty Quantification For Markov Chain Induced Martingales With Application To Temporal Difference Learning
2025 Β· Weichen Wu, Yuting Wei, Alessandro Rinaldo
Abstract
We establish novel and general high-dimensional concentration inequalities and Berry-Esseen bounds for vector-valued martingales induced by Markov chains. We apply these results to analyze the performance of the Temporal Difference (TD) learning algorithm with linear function approximations, a widely used method for policy evaluation in Reinforcement Learning (RL), obtaining a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish an \(O(T^\{-\frac\{1\}\{4\}\}log T)\) distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. Our martingale bounds are of broad applicability, and our analysis of TD learning provides new insights into statistical inference for RL algorithms, bridging gaps between classical stochastic approximation theory and modern RL applications.
Authors
(none)
Tags
Stats
Related papers
- Policy Evaluation And Temporal-difference Learning In Continuous Time And Space: A Martingale Approach (2021)4.52
- Characterizing The Exact Behaviors Of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory (2019)0.00
- Concentration Of Contractive Stochastic Approximation And Reinforcement Learning (2021)0.00
- Adversarially-robust TD Learning With Markovian Data: Finite-time Rates And Fundamental Limits (2025)0.00
- Exact Formulas For Finite-time Estimation Errors Of Decentralized Temporal Difference Learning With Linear Function Approximation (2022)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00