On Global Convergence Rates For Federated Softmax Policy Gradient Under Heterogeneous Environments
2025 Β· Safwan Labbi, Paul Mangold, Daniil Tiapkin, et al.
Abstract
We provide global convergence rates for vanilla and entropy-regularized federated softmax stochastic policy gradient (FedPG) with local training. We show that FedPG converges to a near-optimal policy in terms of the average agent value, with a gap controlled by the level of heterogeneity. Remarkably, we obtain the first convergence rates for entropy-regularized policy gradient with explicit constants, leveraging a projection-like operator. Our results build upon a new analysis of federated averaging for non-convex objectives, based on the observation that the \{\L\}ojasiewicz-type inequalities from the single-agent setting (Mei et al., 2020) do not hold for the federated objective. This uncovers a fundamental difference between single-agent and federated reinforcement learning: while single-agent optimal policies can be deterministic, federated objectives may inherently require stochastic policies.
Authors
(none)
Tags
Stats
Related papers
- On The Global Convergence Rates Of Softmax Policy Gradient Methods (2020)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- On The Global Convergence Rates Of Decentralized Softmax Gradient Play In Markov Potential Games (2022)0.00
- Linear Convergence Of Entropy-regularized Natural Policy Gradient With Linear Function Approximation (2021)6.34
- Rethinking The Global Convergence Of Softmax Policy Gradient With Linear Function Approximation (2025)0.00
- Global Convergence Guarantees For Federated Policy Gradient Methods With Adversaries (2024)0.00
- Asynchronous Federated Reinforcement Learning With Policy Gradient Updates: Algorithm Design And Convergence Analysis (2024)0.00
- Matryoshka Policy Gradient For Entropy-regularized RL: Convergence And Global Optimality (2023)0.00