Descent-guided Policy Gradient For Scalable Cooperative Multi-agent Learning
2026 Β· Shan Yang, Yang Liu
Abstract
Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise. When agents share a common reward, the actions of all \(N\) agents jointly determine each agent's learning signal, so cross-agent noise grows with \(N\). In the policy gradient setting, per-agent gradient estimate variance scales as \(\Theta(N)\), yielding sample complexity \(\mathcal\{O\}(N/\epsilon)\). We observe that many domains, including cloud computing, transportation, and power systems, have differentiable analytical models that prescribe efficient system states. In this work, we propose Descent-Guided Policy Gradient (DG-PG), a framework that utilizes these analytical models to provide each agent with a noise-free gradient signal, decoupling each agent's gradient from the actions of all others. We prove that DG-PG reduces gradient variance from \(\Theta(N)\) to \(\mathcal\{O\}(1)\), preserves the equilibria of the cooperative game, and achieves agent-independent sample
Authors
(none)
Tags
Stats
Related papers
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Distributed Policy Gradient With Variance Reduction In Multi-agent Reinforcement Learning (2021)0.00
- Dimension-free Rates For Natural Policy Gradient In Multi-agent Reinforcement Learning (2021)0.00
- Settling The Variance Of Multi-agent Policy Gradients (2021)0.00
- Scalable Centralized Deep Multi-agent Reinforcement Learning Via Policy Gradients (2018)0.00
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00