From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes
2025 Β· Zaiwei Chen, Phalguni Nanda
Abstract
This work presents the first finite-time analysis for the last-iterate convergence of average-reward \(Q\)-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair. We show that, under appropriate assumptions, the iterates generated by this \(Q\)-learning algorithm converge at a rate of \(\tilde\{\mathcal\{O\}\}(1/k)\) (in the mean-square sense) to the optimal \(Q\)-function in the span seminorm. Moreover, by adding a centering step to the algorithm, we further establish pointwise mean-square convergence to the centered optimal \(Q\)-function, also at a rate of \(\tilde\{\mathcal\{O\}\}(1/k)\). To prove these results, we show that adaptive stepsizes are necessary, as without them, the algorithm fails to converge to the correct target. In addition, adaptive stepsizes can be interpreted as a form of implicit importance sampling that counteracts the effects of asynchronous
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Asynchronous Q-learning Under Diminishing Step-size From Control-theoretic View (2022)3.58
- Constant Stepsize Q-learning: Distributional Convergence, Bias And Extrapolation (2024)0.00
- A Discrete-time Switching System Analysis Of Q-learning (2021)8.35
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- Finite-time Analysis For Double Q-learning (2020)0.00
- A Distributional Analysis Of Sampling-based Reinforcement Learning Algorithms (2020)0.00
- Approximate Information State Based Convergence Analysis Of Recurrent Q-learning (2023)0.00
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00