Sample Complexity Of Asynchronous Q-learning: Sharper Analysis And Variance Reduction
2020 Β· Gen Li, Yuting Wei, Yuejie Chi, et al.
Abstract
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a \(\gamma\)-discounted MDP with state space \(\mathcal\{S\}\) and action space \(\mathcal\{A\}\), we demonstrate that the \(\ell_\{\infty\}\)-based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise \(\epsilon\)-accurate estimate of the Q-function --- is at most on the order of \(\frac\{1\}\{\mu_\{\min\}(1-\gamma)^5\epsilon^2\}+ \frac\{t_\{mix\}\}\{\mu_\{\min\}(1-\gamma)\}\) up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, \(t_\{mix\}\) and \(\mu_\{\min\}\) denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the sample complexity in the synchronous case with indepen
Authors
(none)
Tags
Stats
Related papers
- Is Q-learning Minimax Optimal? A Tight Sample Complexity Analysis (2021)0.00
- Sample Complexity Of Variance-reduced Distributionally Robust Q-learning (2023)0.00
- Q-learning With Uniformly Bounded Variance: Large Discounting Is Not A Barrier To Fast Learning (2020)0.00
- Variance-reduced Cascade Q-learning: Algorithms And Sample Complexity (2024)0.00
- Finite-time Analysis For Double Q-learning (2020)0.00
- Sample Efficient Reinforcement Learning With Partial Dynamics Knowledge (2023)3.58
- Finite-time Analysis Of Asynchronous Q-learning Under Diminishing Step-size From Control-theoretic View (2022)3.58
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00