Abstract

Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named \(\texttt\{MFPO\}\), that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, \(\texttt\{MFPO\}\) can achieve \(\tilde\{\mathcal\{O\}\}(H N^\{-1\}\epsilon^\{-3/2\})\) and \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-1\})\) interaction and communication complexities (\(N\) represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorith

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations3
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score4.52
  • arxiv keyyue2024momentum

Related papers