Finite-time Analysis For Double Q-learning
2020 Β· Huaqing Xiong, Lin Zhao, Yingbin Liang, et al.
Abstract
Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet\{hasselt2010double\} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an \(\epsilon\)-accurate neighborhood of the global optimum by taking \(\tilde\{Ξ©\}\left(\left( \frac\{1
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Simultaneous Double Q-learning (2024)0.00
- The Mean-squared Error Of Double Q-learning (2020)0.00
- Finite-time Analysis Of Asynchronous Q-learning Under Diminishing Step-size From Control-theoretic View (2022)3.58
- On The Estimation Bias In Double Q-learning (2021)0.00
- A Discrete-time Switching System Analysis Of Q-learning (2021)8.35
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35
- From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes (2025)0.00
- Sample Complexity Of Asynchronous Q-learning: Sharper Analysis And Variance Reduction (2020)11.19