Finite-sample Analysis Of Greedy-gq With Linear Function Approximation Under Markovian Noise
2020 Β· Yue Wang, Shaofeng Zou
Abstract
Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniqu
Authors
(none)
Tags
Stats
Related papers
- Greedy-gq With Variance Reduction: Finite-time Analysis And Improved Complexity (2021)0.00
- Finite Sample Analysis Of The GTD Policy Evaluation Algorithms In Markov Setting (2018)0.00
- Sample Complexity Bounds For Two Timescale Value-based Reinforcement Learning Algorithms (2020)0.00
- Full Error Analysis Of Policy Gradient Learning Algorithms For Exploratory Linear Quadratic Mean-field Control Problem In Continuous Time With Common Noise (2024)0.00
- Convergence Of Policy Gradient Methods For Finite-horizon Exploratory Linear-quadratic Control Problems (2022)9.23
- Finite-sample Analysis Of Proximal Gradient TD Algorithms (2020)0.00
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00