Assigning Credit With Partial Reward Decoupling In Multi-agent Proximal Policy Optimization
2024 Β· Aditya Kapoor, Benjamin Freed, Howie Choset, et al.
Abstract
Multi-agent proximal policy optimization (MAPPO) has recently demonstrated state-of-the-art performance on challenging multi-agent reinforcement learning tasks. However, MAPPO still struggles with the credit assignment problem, wherein the sheer difficulty in ascribing credit to individual agents' actions scales poorly with team size. In this paper, we propose a multi-agent reinforcement learning algorithm that adapts recent developments in credit assignment to improve upon MAPPO. Our approach leverages partial reward decoupling (PRD), which uses a learned attention mechanism to estimate which of a particular agent's teammates are relevant to its learning updates. We use this estimate to dynamically decompose large groups of agents into smaller, more manageable subgroups. We empirically demonstrate that our approach, PRD-MAPPO, decouples agents from teammates that do not influence their expected future reward, thereby streamlining credit assignment. We additionally show that PRD-MAPPO
Authors
(none)
Tags
Stats
Related papers
- Learning Explicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning Via Polarization Policy Gradient (2022)4.52
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00
- Asynchronous Credit Assignment For Multi-agent Reinforcement Learning (2024)0.00
- Cooperative Game-theoretic Credit Assignment For Multi-agent Policy Gradients Via The Core (2025)0.00
- Promp: Proximal Meta-policy Search (2018)0.00
- Learning Implicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Shapley Counterfactual Credits For Multi-agent Reinforcement Learning (2021)12.40
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00