MACRPO: Multi-agent Cooperative Recurrent Policy Optimization
2021 Β· Eshagh Kargar, Ville Kyrki
Abstract
This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit\{Multi-Agent Cooperative Recurrent Proximal Policy Optimization\} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Constrained Policy Optimisation (2021)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts (2021)0.00
- MAC-PO: Multi-agent Experience Replay Via Collective Priority Optimization (2023)0.00
- Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability (2022)5.84