Cooperative Multi-agent RL With Communication Constraints
2026 Β· Nuoya Xiong, Aarti Singh
Abstract
Cooperative MARL often assumes frequent access to global information in a data buffer, such as team rewards or other agents' actions, which is typically unrealistic in decentralized MARL systems due to high communication costs. When communication is limited, agents must rely on outdated information to estimate gradients and update their policies. A common approach to handle missing data is called importance sampling, in which we reweigh old data from a base policy to estimate gradients for the current policy. However, it quickly becomes unstable when the communication is limited (i.e. missing data probability is high), so that the base policy in importance sampling is outdated. To address this issue, we propose a technique called base policy prediction, which utilizes old gradients to predict the policy update and collect samples for a sequence of base policies, which reduces the gap between the base policy and the current policy. This approach enables effective learning with significa
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Multi-agent Reinforcement Learning With Fully Decentralized Communication (2021)0.00
- Hypermarl: Adaptive Hypernetworks For Multi-agent RL (2024)0.00
- Asynchronous Cooperative Multi-agent Reinforcement Learning With Limited Communication (2025)0.00
- Remembering The Markov Property In Cooperative MARL (2025)0.00
- AC2C: Adaptively Controlled Two-hop Communication For Multi-agent Reinforcement Learning (2023)0.00
- Efficient Communication Via Self-supervised Information Aggregation For Online And Offline Multi-agent Reinforcement Learning (2023)6.34
- Cooperative Multi-agent Reinforcement Learning With Partial Observations (2020)10.35
- Robust Multi-agent Reinforcement Learning With Social Empowerment For Coordination And Communication (2020)0.00