From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes

Abstract

This work presents the first finite-time analysis for the last-iterate convergence of average-reward \(Q\)-learning with an asynchronous implementation. A key feature of the algorithm we study is the use of adaptive stepsizes, which serve as local clocks for each state-action pair. We show that, under appropriate assumptions, the iterates generated by this \(Q\)-learning algorithm converge at a rate of \(\tilde\{\mathcal\{O\}\}(1/k)\) (in the mean-square sense) to the optimal \(Q\)-function in the span seminorm. Moreover, by adding a centering step to the algorithm, we further establish pointwise mean-square convergence to the centered optimal \(Q\)-function, also at a rate of \(\tilde\{\mathcal\{O\}\}(1/k)\). To prove these results, we show that adaptive stepsizes are necessary, as without them, the algorithm fails to converge to the correct target. In addition, adaptive stepsizes can be interpreted as a form of implicit importance sampling that counteracts the effects of asynchronous

From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes

Abstract

Authors

Tags

Stats

Related papers