Is Q-learning Minimax Optimal? A Tight Sample Complexity Analysis
2021 Β· Gen Li, Changxiao Cai, Yuxin Chen, et al.
Abstract
Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made towards understanding the sample efficiency of Q-learning. Consider a \(\gamma\)-discounted infinite-horizon MDP with state space \(\mathcal\{S\}\) and action space \(\mathcal\{A\}\): to yield an entrywise \(\epsilon\)-approximation of the optimal Q-function, state-of-the-art theory for Q-learning requires a sample size exceeding the order of \(\frac\{|\mathcal\{S\}||\mathcal\{A\}|\}\{(1-\gamma)^5\epsilon^\{2\}\}\), which fails to match existing minimax lower bounds. This gives rise to natural questions: what is the sharp sample complexity of Q-learning? Is Q-learning provably sub-optimal? This paper addresses these questions for the synchrono
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Asynchronous Q-learning: Sharper Analysis And Variance Reduction (2020)11.19
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00
- Minimax Optimal Q Learning With Nearest Neighbors (2023)8.09
- Q-learning With Uniformly Bounded Variance: Large Discounting Is Not A Barrier To Fast Learning (2020)0.00
- Q-learning With UCB Exploration Is Sample Efficient For Infinite-horizon MDP (2019)0.00
- Learning The Model While Learning Q: Finite-time Sample Complexity Of Online Syncmbq (2024)0.00
- Sample Complexity Of Variance-reduced Distributionally Robust Q-learning (2023)0.00
- Sample Efficient Reinforcement Learning With Partial Dynamics Knowledge (2023)3.58