Q-value Path Decomposition For Deep Multiagent Reinforcement Learning
2020 Β· Yaodong Yang, Jianye Hao, Guangyong Chen, et al.
Abstract
Recently, deep multiagent reinforcement learning (MARL) has become a highly active research area as many real-world problems can be inherently viewed as multiagent systems. A particularly interesting and widely applicable class of problems is the partially observable cooperative multiagent setting, in which a team of agents learns to coordinate their behaviors conditioning on their private observations and commonly shared global reward signals. One natural solution is to resort to the centralized training and decentralized execution paradigm. During centralized training, one key challenge is the multiagent credit assignment: how to allocate the global rewards for individual agent policies for better coordination towards maximizing system-level's benefits. In this paper, we propose a new method called Q-value Path Decomposition (QPD) to decompose the system's global Q-values into individual agents' Q-values. Unlike previous works which restrict the representation relation of the individ
Authors
(none)
Tags
Stats
Related papers
- Qatten: A General Framework For Cooperative Multiagent Reinforcement Learning (2020)0.00
- Residual Q-networks For Value Function Factorizing In Multi-agent Reinforcement Learning (2022)10.21
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Understanding Value Decomposition Algorithms In Deep Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Adaptive Value Decomposition With Greedy Marginal Contribution Computation For Cooperative Multi-agent Reinforcement Learning (2023)3.58
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00