MCMARL: Parameterizing Value Function Via Mixture Of Categorical Distributions For Multi-agent Reinforcement Learning
2022 Β· Jian Zhao, Mingyu Yang, Youpeng Zhao, et al.
Abstract
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, such randomness is ignored by most of the existing value-based multi-agent reinforcement learning (MARL) methods, which only model the expectation of Q-value for both individual agents and the team. Compared to using the expectations of the long-term returns, it is preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes a novel value-based MARL framework from a distributional perspective, *i.e.*, parameterizing value function via \underline\{M\}ixture of \underline\{C\}ategorical distributions for MARL. Specifically, we model both individ
Authors
(none)
Tags
Stats
Related papers
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- DFAC Framework: Factorizing The Value Function Via Quantile Mixture For Multi-agent Distributional Q-learning (2021)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- A Unified Framework For Factorizing Distributional Value Functions For Multi-agent Reinforcement Learning (2023)0.00
- QR-MIX: Distributional Value Function Factorisation For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Inducing Cooperation Via Team Regret Minimization Based Multi-agent Deep Reinforcement Learning (2019)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Risk-aware Distributed Multi-agent Reinforcement Learning (2023)3.58