Variance-reduced Off-policy TDC Learning: Non-asymptotic Convergence Analysis
2020 Β· Shaocong Ma, Yi Zhou, Shaofeng Zou
Abstract
Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i.i.d.\ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i.i.d.\ and Markovian samples. In the i.i.d.\ setting, our algorithm \{matches the best-known lower bound \(\tilde\{O\}(\epsilon^\{-1\}\)).\} In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity \(O(\epsilon^\{-1\} log \{\epsilon\}^\{-1\})\) that is near-optimal. Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller asymptotic convergence error than bo
Authors
(none)
Tags
Stats
Related papers
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Reanalysis Of Variance Reduced Temporal Difference Learning (2020)0.00
- Non-asymptotic Analysis For Two Time-scale TDC With General Smooth Function Approximation (2021)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- On A Convergent Off -policy Temporal Difference Learning Algorithm In On-line Learning Environment (2016)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- High-probability Sample Complexities For Policy Evaluation With Linear Function Approximation (2023)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00