Counterfactual Multi-agent Policy Gradients
2017 Β· Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, et al.
Abstract
Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability
Authors
(none)
Tags
Stats
Related papers
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Counterfactual Multi-agent Reinforcement Learning With Graph Convolution Communication (2020)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Cooperative Game-theoretic Credit Assignment For Multi-agent Policy Gradients Via The Core (2025)0.00
- Difference Rewards Policy Gradients (2020)0.00
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00