Multiway-adapater: Adapting Large-scale Multi-modal Models For Scalable Image-text Retrieval
2023 Β· Zijun Long, George Killick, Richard McCreadie, et al.
Abstract
As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. Indeed, traditional fine-tuning methods are costly, due to the need for extensive, task-specific training. While efficient adaptation methods exist that aim to reduce these costs, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these computational challenges and improve inter-modal alignment, we introduce the MultiWay-Adapter (MWA), a novel framework featuring an 'Alignment Enhancer'. This enhancer deepens inter-modal alignment, enabling high transferability with minimal tuning effort. Our experiments show that unlike prior efficient tuning approaches, MWA maintains model effectiveness, while reducing training time by up-to 57%. MWA is also lightweight, increasing model size by only 2-3% (in terms of parameters) for state-of-the-art foundation models lik
Authors
(none)
Tags
Stats
Related papers
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Cross-modal Adapter: Parameter-efficient Transfer Learning Approach For Vision-language Models (2024)6.77
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Magic-mm-embedding: Towards Visual-token-efficient Universal Multimodal Embedding With Mllms (2026)0.00
- Multilingual-to-multimodal (M2M): Unlocking New Languages With Monolingual Text (2026)0.00
- Uniadapter: Unified Parameter-efficient Transfer Learning For Cross-modal Modeling (2023)3.77
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Efficient And Versatile Robust Fine-tuning Of Zero-shot Models (2024)4.52