A Finite-time Analysis Of Q-learning With Neural Network Function Approximation
2019 Β· Pan Xu, Quanquan Gu
Abstract
Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with \(O(1/\sqrt\{T\})\) convergence rate if the neural function approximator is sufficiently overparameterized, where \(T\) is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.
Authors
(none)
Tags
Stats
Related papers
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35
- Universal Approximation Theorem Of Deep Q-networks (2025)0.00
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- An \(L^2\) Analysis Of Reinforcement Learning In High Dimensions With Kernel And Neural Network Approximation (2021)0.00
- Finite-time Analysis For Double Q-learning (2020)0.00
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- Neural Temporal-difference And Q-learning Provably Converge To Global Optima (2019)7.81
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00