Abstract

In many real-world tasks, multiple agents must learn to coordinate with each other given their private observations and limited communication ability. Deep multiagent reinforcement learning (Deep-MARL) algorithms have shown superior performance in such challenging settings. One representative class of work is multiagent value decomposition, which decomposes the global shared multiagent Q-value \(Q_\{tot\}\) into individual Q-values \(Q^\{i\}\) to guide individuals' behaviors, i.e. VDN imposing an additive formation and QMIX adopting a monotonic assumption using an implicit mixing method. However, most of the previous efforts impose certain assumptions between \(Q_\{tot\}\) and \(Q^\{i\}\) and lack theoretical groundings. Besides, they do not explicitly consider the agent-level impact of individuals to the whole system when transforming individual \(Q^\{i\}\)s into \(Q_\{tot\}\). In this paper, we theoretically derive a general formula of \(Q_\{tot\}\) in terms of \(Q^\{i\}\), based on

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyyang2020qatten

Related papers