Learning To Coordinate In Multi-agent Systems: A Coordinated Actor-critic Algorithm And Finite-time Guarantees

Abstract

Multi-agent reinforcement learning (MARL) has attracted much research attention recently. However, unlike its single-agent counterpart, many theoretical and algorithmic aspects of MARL have not been well-understood. In this paper, we study the emergence of coordinated behavior by autonomous agents using an actor-critic (AC) algorithm. Specifically, we propose and analyze a class of coordinated actor-critic algorithms (CAC) in which individually parametrized policies have a \{\it shared\} part (which is jointly optimized among all agents) and a \{\it personalized\} part (which is only locally optimized). Such kind of \{\it partially personalized\} policy allows agents to learn to coordinate by leveraging peers' past experience and adapt to individual tasks. The flexibility in our design allows the proposed MARL-CAC algorithm to be used in a \{\it fully decentralized\} setting, where the agents can only communicate with their neighbors, as well as a \{\it federated\} setting, where the a

Learning To Coordinate In Multi-agent Systems: A Coordinated Actor-critic Algorithm And Finite-time Guarantees

Abstract

Authors

Tags

Stats

Related papers