Minimax Optimal Q Learning With Nearest Neighbors
2023 Β· Puning Zhao, Lifeng Lai
Abstract
Analyzing the Markov decision process (MDP) with continuous state spaces is generally challenging. A recent interesting work \cite\{shah2018q\} solves MDP with bounded continuous state space by a nearest neighbor \(Q\) learning approach, which has a sample complexity of \(\tilde\{O\}(\frac\{1\}\{\epsilon^\{d+3\}(1-\gamma)^\{d+7\}\})\) for \(\epsilon\)-accurate \(Q\) function estimation with discount factor \(\gamma\). In this paper, we propose two new nearest neighbor \(Q\) learning methods, one for the offline setting and the other for the online setting. We show that the sample complexities of these two methods are \(\tilde\{O\}(\frac\{1\}\{\epsilon^\{d+2\}(1-\gamma)^\{d+2\}\})\) and \(\tilde\{O\}(\frac\{1\}\{\epsilon^\{d+2\}(1-\gamma)^\{d+3\}\})\) for offline and online methods respectively, which significantly improve over existing results and have minimax optimal dependence over \(\epsilon\). We achieve such improvement by utilizing the samples more efficiently. In particular, the
Authors
(none)
Tags
Stats
Related papers
- Q-learning For Mdps With General Spaces: Convergence And Near Optimality Via Quantization Under Weak Continuity (2021)0.00
- Is Q-learning Minimax Optimal? A Tight Sample Complexity Analysis (2021)0.00
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00
- Scalable Spectral Representations For Multi-agent Reinforcement Learning In Network Mdps (2024)0.00
- Confident Natural Policy Gradient For Local Planning In \(q_\pi\)-realizable Constrained Mdps (2024)0.00
- Structured Difference-of-q Via Orthogonal Learning (2024)0.00