Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples
2019 Β· Tengyu Xu, Shaofeng Zou, Yingbin Liang
Abstract
Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm co
Authors
(none)
Tags
Stats
Related papers
- Non-asymptotic Analysis For Two Time-scale TDC With General Smooth Function Approximation (2021)0.00
- Variance-reduced Off-policy TDC Learning: Non-asymptotic Convergence Analysis (2020)3.58
- Finite-sample Analysis Of Proximal Gradient TD Algorithms (2020)0.00
- Baird Counterexample Is Solved: With An Example Of How To Debug A Two-time-scale Algorithm (2023)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning (2020)0.00
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00