FACMAC: Factored Multi-agent Centralised Policy Gradients
2020 Β· Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, et al.
Abstract
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as in QMIX, a popular multi-agent Q-learning algorithm. However, unlike QMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, or monotonically factored critics. In addition, FACMAC uses a centralised policy gradient estimator that optimises over the entire joint action space, rather than optimising over each agent's ac
Authors
(none)
Tags
Stats
Related papers
- More Centralized Training, Still Decentralized Execution: Multi-agent Conditional Policy Factorization (2022)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Mean Actor Critic (2017)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- Factored Policy Gradients: Leveraging Structure For Efficient Learning In Momdps (2021)0.00
- Scalable Centralized Deep Multi-agent Reinforcement Learning Via Policy Gradients (2018)0.00
- F2A2: Flexible Fully-decentralized Approximate Actor-critic For Cooperative Multi-agent Reinforcement Learning (2020)0.00