U-MARVEL: Unveiling Key Factors For Universal Multimodal Retrieval Via Embedding Learning With Mllms
2025 Β· Xiaojie Li, Chu Li, Shi-Zhe Chen, et al.
Abstract
Universal multimodal retrieval (UMR), which aims to address complex retrieval tasks where both queries and candidates span diverse modalities, has been significantly advanced by the emergence of MLLMs. While state-of-the-art MLLM-based methods in the literature predominantly adopt contrastive learning principles, they often differ in their specific training recipes. Despite their success, the mechanisms underlying their retrieval capabilities remain largely unexplored, potentially resulting in suboptimal performance and limited generalization ability. To address these issues, we present a comprehensive study aimed at uncovering the key factors that drive effective embedding learning for UMR using MLLMs. We begin by implementing a general MLLM-based embedding learning pipeline, and systematically analyze the primary contributors to high-performing universal retrieval systems. Based on this, we explore various aspects of the details in embedding generation and training strategies, includ
Authors
(none)
Tags
Stats
Related papers
- GME: Improving Universal Multimodal Retrieval By Multimodal Llms (2024)0.00
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Magic-mm-embedding: Towards Visual-token-efficient Universal Multimodal Embedding With Mllms (2026)0.00
- Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025)4.52
- Unime-v2: Mllm-as-a-judge For Universal Multimodal Embedding Learning (2025)0.00
- RETLLM: Training And Data-free Mllms For Multimodal Information Retrieval (2026)1.57
- Modality Curation: Building Universal Embeddings For Advanced Multimodal Information Retrieval (2025)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00