Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning
2021 Β· Jianzhun Shao, Hongchang Zhang, Yuhang Jiang, et al.
Abstract
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learning-based Mixing Network with Meta Policy Gradient~(MNMPG) framework to distill the global hierarchy for delicate reward decomposition. The excitation signal for learning global hierarchy is deduced from the episode reward difference between before and after "exercise updates" through the utility network. Our method is generally applicable to the CTDE method using a monotonic mixing network. Experiments on the StarCraft II micromanagement benchmark demonstrate that our method just with a simple utility network is able to outperform the current state-of-the-art MARL algorithms on 4 of 5 super hard scenarios. Better performance can be further ac
Authors
(none)
Tags
Stats
Related papers
- Assigning Credit With Partial Reward Decoupling In Multi-agent Proximal Policy Optimization (2024)0.00
- Learning Explicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning Via Polarization Policy Gradient (2022)4.52
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Promp: Proximal Meta-policy Search (2018)0.00
- Difference Rewards Policy Gradients (2020)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Learning Implicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- QLLM: Do We Really Need A Mixing Network For Credit Assignment In Multi-agent Reinforcement Learning? (2025)0.00