Inducing Cooperation Via Team Regret Minimization Based Multi-agent Deep Reinforcement Learning
2019 Β· Runsheng Yu, Zhenyu Shi, Xinrun Wang, et al.
Abstract
Existing value-factorized based Multi-Agent deep Reinforce-ment Learning (MARL) approaches are well-performing invarious multi-agent cooperative environment under thecen-tralized training and decentralized execution(CTDE) scheme,where all agents are trained together by the centralized valuenetwork and each agent execute its policy independently. How-ever, an issue remains open: in the centralized training process,when the environment for the team is partially observable ornon-stationary, i.e., the observation and action informationof all the agents cannot represent the global states, existingmethods perform poorly and sample inefficiently. Regret Min-imization (RM) can be a promising approach as it performswell in partially observable and fully competitive settings.However, it tends to model others as opponents and thus can-not work well under the CTDE scheme. In this work, wepropose a novel team RM based Bayesian MARL with threekey contributions: (a) we design a novel RM method to tra
Authors
(none)
Tags
Stats
Related papers
- Regret Bounds For Decentralized Learning In Cooperative Multi-agent Dynamical Systems (2020)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Adaptive Value Decomposition With Greedy Marginal Contribution Computation For Cooperative Multi-agent Reinforcement Learning (2023)3.58
- More Centralized Training, Still Decentralized Execution: Multi-agent Conditional Policy Factorization (2022)0.00
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00