Constant Stepsize Q-learning: Distributional Convergence, Bias And Extrapolation
2024 Β· Yixuan Zhang, Qiaomin Xie
Abstract
Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we provide an explicit expression for the linear coefficient. This precise characterization of the bias allows
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- A Distributional Analysis Of Sampling-based Reinforcement Learning Algorithms (2020)0.00
- From Set Convergence To Pointwise Convergence: Finite-time Guarantees For Average-reward Q-learning With Adaptive Stepsizes (2025)0.00
- A Discrete-time Switching System Analysis Of Q-learning (2021)8.35
- Finite-time Analysis Of Asynchronous Q-learning Under Diminishing Step-size From Control-theoretic View (2022)3.58
- Non-asymptotic Analysis Of Biased Stochastic Approximation Scheme (2019)0.00
- Asymptotic Analysis Of Sample-averaged Q-learning (2024)0.00
- Sample Complexity Bounds For Two Timescale Value-based Reinforcement Learning Algorithms (2020)0.00