Finite-sample Analysis Of Proximal Gradient TD Algorithms
2020 Β· Bo Liu, Ji Liu, Mohammad Ghavamzadeh, et al.
Abstract
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms. Previous analyses of this class of algorithms use ODE techniques to prove asymptotic convergence, and to the best of our knowledge, no finite-sample analysis has been done. Moreover, there has been not much work on finite-sample analysis for convergent off-policy reinforcement learning algorithms. In this paper, we formulate GTD methods as stochastic gradient algorithms w.r.t.~a primal-dual saddle-point objective function, and then conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Two revised algorithms are also proposed, namely projected GTD2 and GTD2-MP, which offer improved convergence guarantees and acceleration, respectively. The results of our theoretical analysis show that the GTD family of algorithms are indeed comparable to the existing LSTD methods in off-policy learning scenarios.
Authors
(none)
Tags
Stats
Related papers
- Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning With Polynomial Sample Complexity (2020)5.84
- Regularized Gradient Temporal-difference Learning (2026)0.00
- New Versions Of Gradient Temporal Difference Learning (2021)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Finite Sample Analysis Of The GTD Policy Evaluation Algorithms In Markov Setting (2018)0.00
- Nonlinear Distributional Gradient Temporal-difference Learning (2018)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- O\(^2\)TD: (near)-optimal Off-policy TD Learning (2017)0.00