Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning
2022 Β· Wei Fu, Chao Yu, Zelai Xu, et al.
Abstract
Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implement
Authors
(none)
Tags
Stats
Related papers
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Q-value Path Decomposition For Deep Multiagent Reinforcement Learning (2020)0.00
- A Review Of Cooperative Multi-agent Deep Reinforcement Learning (2019)19.08
- Hypermarl: Adaptive Hypernetworks For Multi-agent RL (2024)0.00
- Benchmarking Multi-agent Deep Reinforcement Learning Algorithms In Cooperative Tasks (2020)0.00
- Towards Global Optimality In Cooperative MARL With The Transformation And Distillation Framework (2022)0.00
- An Initial Introduction To Cooperative Multi-agent Reinforcement Learning (2024)0.00
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00