Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL
2026 Β· Shrenik Patel, Christine Truong
Abstract
Constrained multi-agent reinforcement learning (MARL) faces a fundamental tension between exploration and safety-constrained optimization. Existing leading approaches, such as Lagrangian methods, typically rely on global penalties or centralized critics that react to violations after they occur, often suppressing exploration and leading to over-conservatism. We propose Co2PO, a novel MARL communication-augmented framework that enables coordination-driven safety through selective, risk-aware communication. Co2PO introduces a shared blackboard architecture for broadcasting positional intent and yield signals, governed by a learned hazard predictor that proactively forecasts potential violations over an extended temporal horizon. By integrating these forecasts into a constrained optimization objective, Co2PO allows agents to anticipate and navigate collective hazards without the performance trade-offs inherent in traditional reactive constraints. We evaluate Co2PO across a suite of comple
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Constrained Policy Optimisation (2021)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Multi-agent Guided Policy Optimization (2025)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Halypo: Heterogeneous-agent Lyapunov Policy Optimization For Human-robot Collaboration (2026)0.00
- Reward Constrained Policy Optimization (2018)0.00