Learning Explicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning Via Polarization Policy Gradient
2022 Β· Wubing Chen, Wenbin Li, Xiao Liu, et al.
Abstract
Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as \textit\{centralized-decentralized mismatch\}. To address this issue, this paper presents a novel method, \textit\{\underline\{M\}ulti-\underline\{A\}gent \underline\{P\}olarization \underline\{P\}olicy \underline\{G\}radient\} (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential g
Authors
(none)
Tags
Stats
Related papers
- Cooperative Game-theoretic Credit Assignment For Multi-agent Policy Gradients Via The Core (2025)0.00
- Assigning Credit With Partial Reward Decoupling In Multi-agent Proximal Policy Optimization (2024)0.00
- Asynchronous, Option-based Multi-agent Policy Gradient: A Conditional Reasoning Approach (2022)0.00
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00
- TAPE: Leveraging Agent Topology For Cooperative Multi-agent Policy Gradient (2023)3.58
- Learning Implicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Optimistic Multi-agent Policy Gradient (2023)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00