Chatbridge: Bridging Modalities With Large Language Model As A Language Catalyst
2023 Β· Zijia Zhao, Longteng Guo, Tongtian Yue, et al.
Abstract
Building general-purpose models that can perceive diverse real-world modalities and solve various tasks is an appealing target in artificial intelligence. In this paper, we present ChatBridge, a novel multimodal language model that leverages the expressive capabilities of language as the catalyst to bridge the gap between various modalities. We show that only language-paired two-modality data is sufficient to connect all modalities. ChatBridge leverages recent large language models (LLM) and extends their zero-shot capabilities to incorporate diverse multimodal inputs. ChatBridge undergoes a two-stage training. The first stage aligns each modality with language, which brings emergent multimodal correlation and collaboration abilities. The second stage instruction-finetunes ChatBridge to align it with user intent with our newly proposed multimodal instruction tuning dataset, named MULTIS, which covers a wide range of 16 multimodal tasks of text, image, video, and audio modalities. We sh
Authors
(none)
Tags
Stats
Related papers
- Speechgpt: Empowering Large Language Models With Intrinsic Cross-modal Conversational Abilities (2023)16.59
- Multimodal Large Language Models: A Survey (2023)0.00
- X-LLM: Bootstrapping Advanced Large Language Models By Treating Multi-modalities As Foreign Languages (2023)0.00
- Macaw-llm: Multi-modal Language Modeling With Image, Audio, Video, And Text Integration (2023)0.00
- CACARA: Cross-modal Alignment Leveraging A Text-centric Approach For Cost-effective Multimodal And Multilingual Learning (2025)0.00
- Llms Meet Multimodal Generation And Editing: A Survey (2024)5.48
- Towards Multi-modal Mastery: A 4.5B Parameter Truly Multi-modal Small Language Model (2024)2.26
- Mixture-of-transformers: A Sparse And Scalable Architecture For Multi-modal Foundation Models (2024)0.00