DLM: Unified Decision Language Models For Offline Multi-agent Sequential Decision Making
2026 Β· Zhuohui Zhang, Bin Cheng, Bin He
Abstract
arXiv:2604.23557v1 Announce Type: cross Abstract: Building scalable and reusable multi-agent decision policies from offline datasets remains a challenge in offline multi-agent reinforcement learning (MARL), as existing methods often rely on fixed observation formats and action spaces that limit generalization. In contrast, large language models (LLMs) offer a flexible modeling interface that can naturally accommodate heterogeneous observations and actions. Motivated by this, we propose the Decision Language Model (DLM), which formulates multi-agent decision making as a dialogue-style sequence prediction problem under the centralized training with decentralized execution paradigm. DLM is trained in two stages: a supervised fine-tuning phase, which leverages dialogue-style datasets for centralized training with inter-agent context and generates executable actions from offline trajectories, followed by a group relative policy optimization phase to enhance robustness to out-of-distributio
Authors
(none)
Tags
Stats
Related papers
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Tompo: Training LLM Strategic Decision Making From A Multi-agent Perspective (2025)0.00
- YOLO-MARL: You Only LLM Once For Multi-agent Reinforcement Learning (2024)0.00
- Himac: Hierarchical Macro-micro Learning For Long-horizon LLM Agents (2026)0.00
- Mental Modeling Of Reinforcement Learning Agents By Language Models (2024)0.00
- Closed-loop Vision-language Planning For Multi-agent Coordination (2026)0.00
- Offline Pre-trained Multi-agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks (2021)0.00
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00