Sample Complexity Of Average-reward Q-learning: From Single-agent To Federated Reinforcement Learning
2026 Β· Yuchen Jiao, Jiin Woo, Gen Li, et al.
Abstract
Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity \(\widetilde\{O\}\left(\frac\{|\mathcal\{S\}||\mathcal\{A\}|\|h^\{\star\}\|_\{\mathsf\{sp\}\}^3\}\{\epsilon^3\}\right)\), where \(\|h^\{\star\}\|_\{\mathsf\{sp\}\}\) is the span norm of the bias function, improving previous results by at least a factor of \(\frac\{\|h^\{\star\}\|_\{\mathsf\{sp\}\}
Authors
(none)
Tags
Stats
Related papers
- A Finite Time Analysis Of Distributed Q-learning (2024)0.00
- The Sample-communication Complexity Trade-off In Federated Q-learning (2024)0.00
- Sample Complexity Of Distributionally Robust Average-reward Reinforcement Learning (2025)0.00
- Optimal Sample Complexity For Average Reward Markov Decision Processes (2023)0.00
- The Blessing Of Heterogeneity In Federated Q-learning: Linear Speedup And Beyond (2023)0.00
- Near Sample-optimal Reduction-based Policy Learning For Average Reward MDP (2022)0.00
- Sharper Model-free Reinforcement Learning For Average-reward Markov Decision Processes (2023)0.00
- Federated Q-learning: Linear Regret Speedup With Low Communication Cost (2023)0.00