Gap-dependent Bounds For Federated \(q\)-learning
2025 Β· Haochen Zhang, Zhong Zheng, Lingzhou Xue
Abstract
We present the first gap-dependent analysis of regret and communication cost for on-policy federated \(Q\)-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to \(\sqrt\{T\}\)-type regret bounds and communication cost bounds with a \(log T\) term scaling with the number of agents \(M\), states \(S\), and actions \(A\), where \(T\) is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a \(log T\)-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on \(MSA\) from the \(log T\) term. Notably, our gap-dependent communication cost bound also yields a better global switching cost whe
Authors
(none)
Tags
Stats
Related papers
- Federated Q-learning: Linear Regret Speedup With Low Communication Cost (2023)0.00
- Federated Q-learning With Reference-advantage Decomposition: Almost Optimal Regret And Logarithmic Communication Cost (2024)0.00
- Regret-optimal Q-learning With Low Cost For Single-agent And Federated Reinforcement Learning (2025)0.00
- The Sample-communication Complexity Trade-off In Federated Q-learning (2024)0.00
- Refined Regret For Adversarial Mdps With Linear Function Approximation (2023)0.00
- Regret Bounds For Decentralized Learning In Cooperative Multi-agent Dynamical Systems (2020)0.00
- Fast Rates For The Regret Of Offline Reinforcement Learning (2021)2.26
- Rgmcomm: Return Gap Minimization Via Discrete Communications In Multi-agent Reinforcement Learning (2023)6.77