MCMARL: Parameterizing Value Function Via Mixture Of Categorical Distributions For Multi-agent Reinforcement Learning

Abstract

In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a team reward and observing the next state. During the interactions, the uncertainty of environment and reward will inevitably induce stochasticity in the long-term returns and the randomness can be exacerbated with the increasing number of agents. However, such randomness is ignored by most of the existing value-based multi-agent reinforcement learning (MARL) methods, which only model the expectation of Q-value for both individual agents and the team. Compared to using the expectations of the long-term returns, it is preferable to directly model the stochasticity by estimating the returns through distributions. With this motivation, this work proposes a novel value-based MARL framework from a distributional perspective, *i.e.*, parameterizing value function via \underline\{M\}ixture of \underline\{C\}ategorical distributions for MARL. Specifically, we model both individ

MCMARL: Parameterizing Value Function Via Mixture Of Categorical Distributions For Multi-agent Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers