Closed-loop Vision-language Planning For Multi-agent Coordination
2026 Β· Zhiyuan Li, Wenshuai Zhao, Joni Pajarinen
Abstract
arXiv:2502.10148v3 Announce Type: replace Abstract: Cooperative multi-agent reinforcement learning (MARL) struggles with sample efficiency, interpretability, and generalization. While Large Language Models (LLMs) offer powerful planning capabilities, their application has been hampered by a reliance on text-only inputs and a failure to handle the non-Markovian, partially observable nature of multi-agent tasks. We introduce COMPASS, a multi-agent framework that overcomes these limitations by integrating Vision-Language Models (VLMs) for decentralized, closed-loop decision-making. COMPASS dynamically generates and refines interpretable, code-based strategies stored in a skill library that is bootstrapped from expert demonstrations. To ensure robust coordination, it propagates entity information through a structured multi-hop communication protocol, allowing teams to build a coherent understanding from partial observations. Evaluated on the challenging SMACv2 benchmark, COMPASS significa
Authors
(none)
Tags
Stats
Related papers
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Towards Collaborative Intelligence: Propagating Intentions And Reasoning For Multi-agent Coordination With Large Language Models (2024)0.00
- Communicating Plans, Not Percepts: Scalable Multi-agent Coordination With Embodied World Models (2025)0.00
- Enhancing Vision-language Model Training With Reinforcement Learning In Synthetic Worlds For Real-world Success (2025)0.00
- DLM: Unified Decision Language Models For Offline Multi-agent Sequential Decision Making (2026)0.00
- MARSHAL: Incentivizing Multi-agent Reasoning Via Self-play With Strategic Llms (2025)0.00
- YOLO-MARL: You Only LLM Once For Multi-agent Reinforcement Learning (2024)0.00
- Bridging MARL To SARL: An Order-independent Multi-agent Transformer Via Latent Consensus (2026)0.00