Modality Curation: Building Universal Embeddings For Advanced Multimodal Information Retrieval
2025 Β· Fanheng Kong, Jingyuan Zhang, Yahui Liu, et al.
Abstract
Multimodal information retrieval (MIR) faces inherent challenges due to the heterogeneity of data sources and the complexity of cross-modal alignment. While previous studies have identified modal gaps in feature spaces, a systematic approach to address these challenges remains unexplored. In this work, we introduce UNITE, a universal framework that tackles these challenges through two critical yet underexplored aspects: data curation and modality-aware training configurations. Our work provides the first comprehensive analysis of how modality-specific data properties influence downstream task performance across diverse scenarios. Moreover, we propose Modal-Aware Masked Contrastive Learning (MAMCL) to mitigate the competitive relationships among the instances of different modalities. Our framework achieves state-of-the-art results on multiple multimodal retrieval benchmarks, outperforming existing methods by notable margins. Through extensive experiments, we demonstrate that strategic m
Authors
(none)
Tags
Stats
Related papers
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- GME: Improving Universal Multimodal Retrieval By Multimodal Llms (2024)0.00
- Breaking The Modality Barrier: Universal Embedding Learning With Multimodal Llms (2025)4.52
- Generalized Contrastive Learning For Universal Multimodal Retrieval (2025)0.00
- U-MARVEL: Unveiling Key Factors For Universal Multimodal Retrieval Via Embedding Learning With Mllms (2025)3.11
- Mire: Enhancing Multimodal Queries Representation Via Fusion-free Modality Interaction For Multimodal Retrieval (2024)3.81
- MUST: An Effective And Scalable Framework For Multimodal Search Of Target Modality (2023)7.81
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00