Greedy-gq With Variance Reduction: Finite-time Analysis And Improved Complexity
2021 Β· Shaocong Ma, Ziyi Chen, Yi Zhou, et al.
Abstract
Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control. Recently, the finite-time analysis of Greedy-GQ has been developed under linear function approximation and Markovian sampling, and the algorithm is shown to achieve an \(\epsilon\)-stationary point with a sample complexity in the order of \(\mathcal\{O\}(\epsilon^\{-3\})\). Such a high sample complexity is due to the large variance induced by the Markovian samples. In this paper, we propose a variance-reduced Greedy-GQ (VR-Greedy-GQ) algorithm for off-policy optimal control. In particular, the algorithm applies the SVRG-based variance reduction scheme to reduce the stochastic variance of the two time-scale updates. We study the finite-time convergence of VR-Greedy-GQ under linear function approximation and Markovian sampling and show that the algorithm achieves a much smaller bias and variance error than the original Greedy-GQ. In particular, we prove that VR-Greedy-GQ achieves an improved sample comp
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Greedy-gq With Linear Function Approximation Under Markovian Noise (2020)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- Sample Complexity Of Variance-reduced Distributionally Robust Q-learning (2023)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Variance-reduced Cascade Q-learning: Algorithms And Sample Complexity (2024)0.00
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Oracle Complexity Reduction For Model-free LQR: A Stochastic Variance-reduced Policy Gradient Approach (2023)2.26