Sample Complexity Bounds For Two Timescale Value-based Reinforcement Learning Algorithms
2020 Β· Tengyu Xu, Yingbin Liang
Abstract
Two timescale stochastic approximation (SA) has been widely used in value-based reinforcement learning algorithms. In the policy evaluation setting, it can model the linear and nonlinear temporal difference learning with gradient correction (TDC) algorithms as linear SA and nonlinear SA, respectively. In the policy optimization setting, two timescale nonlinear SA can also model the greedy gradient-Q (Greedy-GQ) algorithm. In previous studies, the non-asymptotic analysis of linear TDC and Greedy-GQ has been studied in the Markovian setting, with diminishing or accuracy-dependent stepsize. For the nonlinear TDC algorithm, only the asymptotic convergence has been established. In this paper, we study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ under Markovian sampling and with accuracy-independent constant stepsize. For linear TDC, we provide a novel non-asymptotic analysis and show that it attains an \(\epsilon\)-accurate solution with the o
Authors
(none)
Tags
Stats
Related papers
- Finite Sample Analysis Of Two-timescale Stochastic Approximation With Applications To Reinforcement Learning (2017)0.00
- Finite Time Analysis Of Linear Two-timescale Stochastic Approximation With Markovian Noise (2020)0.00
- A Tale Of Two-timescale Reinforcement Learning With The Tightest Finite-time Bound (2019)0.00
- Finite-time Performance Bounds And Adaptive Learning Rate Selection For Two Time-scale Reinforcement Learning (2019)0.00
- Non-asymptotic Analysis For Two Time-scale TDC With General Smooth Function Approximation (2021)0.00
- Fast Two-time-scale Stochastic Gradient Method With Applications In Reinforcement Learning (2024)0.00
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- High-probability Sample Complexities For Policy Evaluation With Linear Function Approximation (2023)0.00