Multi-agent Constrained Policy Optimisation
2021 Β· Shangding Gu, Jakub Grudzien Kuba, Munning Wen, et al.
Abstract
Developing reinforcement learning algorithms that satisfy safety constraints is becoming increasingly important in real-world applications. In multi-agent reinforcement learning (MARL) settings, policy optimisation with safety awareness is particularly challenging because each individual agent has to not only meet its own safety constraints, but also consider those of others so that their joint behaviour can be guaranteed safe. Despite its importance, the problem of safe multi-agent learning has not been rigorously studied; very few solutions have been proposed, nor a sharable testing environment or benchmarks. To fill these gaps, in this work, we formulate the safe MARL problem as a constrained Markov game and solve it with policy optimisation methods. Our solutions -- Multi-Agent Constrained Policy Optimisation (MACPO) and MAPPO-Lagrangian -- leverage the theories from both constrained policy optimisation and multi-agent trust region learning. Crucially, our methods enjoy theoretical
Authors
(none)
Tags
Stats
Related papers
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Safe Multi-agent Reinforcement Learning With Convergence To Generalized Nash Equilibrium (2024)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Trust Region Policy Optimisation In Multi-agent Reinforcement Learning (2021)0.00
- Model-based Safe Deep Reinforcement Learning Via A Constrained Proximal Policy Optimization Algorithm (2022)5.24
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00