Abstract

Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a \{\it shared\} part (which is jointly optimized among all agents) and a \{\it personalized\} part (which is only locally optimized). Such kind of \{\it partially personalized\} policy allows agents to learn to coordinate by leveraging peers' past experience and adapt to individual tasks. The flexibility in our design allows the proposed MARL-CAC algorithm to be used in a \{\it fully decentralized\} setting, where the agents can only communicate with their neighbors, as well as a \{\it federated\} setting, where the a

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyzeng2021learning

Related papers