Federated Q-learning: Linear Regret Speedup With Low Communication Cost
2023 Β· Zhong Zheng, Fengyu Gao, Lingzhou Xue, et al.
Abstract
In this paper, we consider federated reinforcement learning for tabular episodic Markov Decision Processes (MDP) where, under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. While linear speedup in the number of agents has been achieved for some metrics, such as convergence rate and sample complexity, in similar settings, it is unclear whether it is possible to design a model-free algorithm to achieve linear regret speedup with low communication cost. We propose two federated Q-Learning algorithms termed as FedQ-Hoeffding and FedQ-Bernstein, respectively, and show that the corresponding total regrets achieve a linear speedup compared with their single-agent counterparts when the time horizon is sufficiently large, while the communication cost scales logarithmically in the total number of time steps \(T\). Those results rely on an event-triggered synchronization mechanism between th
Authors
(none)
Tags
Stats
Related papers
- Federated Q-learning With Reference-advantage Decomposition: Almost Optimal Regret And Logarithmic Communication Cost (2024)0.00
- Gap-dependent Bounds For Federated \(q\)-learning (2025)0.00
- The Blessing Of Heterogeneity In Federated Q-learning: Linear Speedup And Beyond (2023)0.00
- The Sample-communication Complexity Trade-off In Federated Q-learning (2024)0.00
- Regret-optimal Q-learning With Low Cost For Single-agent And Federated Reinforcement Learning (2025)0.00
- Federated Offline Reinforcement Learning: Collaborative Single-policy Coverage Suffices (2024)0.00
- Federated TD Learning Over Finite-rate Erasure Channels: Linear Speedup Under Markovian Sampling (2023)0.00
- Provably Efficient Multi-agent Reinforcement Learning With Fully Decentralized Communication (2021)0.00