Training-free Multimodal Large Language Model Orchestration
2025 Β· Tianyu Xie, Yuhang Wu, Yongdong Luo, et al.
Abstract
Different Multimodal Large Language Models (MLLMs) cannot be integrated into a unified multimodal input-output system directly. In previous work, training has been considered as an inevitable component due to challenges in modal alignment, Text-to-Speech efficiency and other integration issues. In this paper, we introduce Multimodal Large Language Model Orchestration, an effective approach for creating interactive multimodal AI systems without additional training. MLLM Orchestration leverages the inherent reasoning capabilities of large language models to coordinate specialized models through explicit workflows, enabling natural multimodal interactions while maintaining modularity, improving interpretability, and significantly enhancing computational efficiency. Our orchestration framework is built upon three key innovations: (1) a central controller LLM that analyzes user inputs and dynamically routes tasks to appropriate specialized models through carefully designed agents; (2) a par
Authors
(none)
Tags
Stats
Related papers
- Multimodal Large Language Models: A Survey (2023)0.00
- Teaching A Multilingual Large Language Model To Understand Multilingual Speech Via Multi-instructional Training (2024)0.00
- Discrete Multimodal Transformers With A Pretrained Large Language Model For Mixed-supervision Speech Processing (2024)0.00
- A Review Of Multi-modal Large Language And Vision Models (2024)0.00
- X-LLM: Bootstrapping Advanced Large Language Models By Treating Multi-modalities As Foreign Languages (2023)0.00
- Llms Meet Multimodal Generation And Editing: A Survey (2024)5.48
- C3LLM: Conditional Multimodal Content Generation Using Large Language Models (2024)0.00
- Large Language Models Are Strong Audio-visual Speech Recognition Learners (2024)9.59