Convergence Of TD(0) Under Polynomial Mixing With Nonlinear Function Approximation
2025 Β· Anupama Sridhar, Alexander Johansen
Abstract
Temporal Difference Learning (TD(0)) is fundamental in reinforcement learning, yet its finite-sample behavior under non-i.i.d. data and nonlinear approximation remains unknown. We provide the first high-probability, finite-sample analysis of vanilla TD(0) on polynomially mixing Markov data, assuming only Holder continuity and bounded generalized gradients. This breaks with previous work, which often requires subsampling, projections, or instance-dependent step-sizes. Concretely, for mixing exponent \(\beta > 1\), Holder continuity exponent \(\gamma\), and step-size decay rate \(\eta \in (1/2, 1]\), we show that, with high probability, \[ \| \theta_t - \theta^* \| \leq C(\beta, \gamma, \eta)\, t^\{-\beta/2\} + C'(\gamma, \eta)\, t^\{-\eta\gamma\} \] after \(t = \mathcal\{O\}(1/\epsilon^2)\) iterations. These bounds match the known i.i.d. rates and hold even when initialization is nonstationary. Central to our proof is a novel discrete-time coupling that bypasses geometric ergodicity, yi
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Geometric Insights Into The Convergence Of Nonlinear TD Learning (2019)0.00
- Finite Sample Analysis Of Linear Temporal Difference Learning With Arbitrary Features (2025)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning (2020)0.00