Understanding Value Decomposition Algorithms In Deep Cooperative Multi-agent Reinforcement Learning
2022 Β· Zehao Dou, Jakub Grudzien Kuba, Yaodong Yang
Abstract
Value function decomposition is becoming a popular rule of thumb for scaling up multi-agent reinforcement learning (MARL) in cooperative games. For such a decomposition rule to hold, the assumption of the individual-global max (IGM) principle must be made; that is, the local maxima on the decomposed value function per every agent must amount to the global maximum on the joint value function. This principle, however, does not have to hold in general. As a result, the applicability of value decomposition algorithms is concealed and their corresponding convergence properties remain unknown. In this paper, we make the first effort to answer these questions. Specifically, we introduce the set of cooperative games in which the value decomposition methods find their validity, which is referred as decomposable games. In decomposable games, we theoretically prove that applying the multi-agent fitted Q-Iteration algorithm (MA-FQI) will lead to an optimal Q-function. In non-decomposable games, th
Authors
(none)
Tags
Stats
Related papers
- Adaptive Value Decomposition With Greedy Marginal Contribution Computation For Cooperative Multi-agent Reinforcement Learning (2023)3.58
- Dual Self-awareness Value Decomposition Framework Without Individual Global Max For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Qatten: A General Framework For Cooperative Multiagent Reinforcement Learning (2020)0.00
- Beyond Monotonicity: Revisiting Factorization Principles In Multi-agent Q-learning (2025)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Q-value Path Decomposition For Deep Multiagent Reinforcement Learning (2020)0.00
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00