Variance-reduced Cascade Q-learning: Algorithms And Sample Complexity
2024 Β· Mohammad Boveiri, Peyman Mohajerin Esfahani
Abstract
We study the problem of estimating the optimal Q-function of \(\gamma\)-discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the \(\ell_\infty\)-norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance optimality while requiring the minimum number of samples theoretically possible. Our theoretical
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Variance-reduced Distributionally Robust Q-learning (2023)0.00
- Sample Complexity Of Asynchronous Q-learning: Sharper Analysis And Variance Reduction (2020)11.19
- Q-learning With Uniformly Bounded Variance: Large Discounting Is Not A Barrier To Fast Learning (2020)0.00
- Is Q-learning Minimax Optimal? A Tight Sample Complexity Analysis (2021)0.00
- Greedy-gq With Variance Reduction: Finite-time Analysis And Improved Complexity (2021)0.00
- Entropic Risk Optimization In Discounted Mdps: Sample Complexity Bounds With A Generative Model (2025)0.00
- Q-learning With UCB Exploration Is Sample Efficient For Infinite-horizon MDP (2019)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00