Freeret: Mllms As Training-free Retrievers
2025 Β· Yuhan Zhu, Xiangyu Zeng, Chenting Wang, et al.
Abstract
Multimodal large language models (MLLMs) are emerging as versatile foundations for mixed-modality retrieval. Yet, they often require heavy post-hoc training to convert them into contrastive encoders for retrieval. This work asks: Can off-the-shelf MLLMs serve as powerful retrievers without additional training? We present FreeRet, a plug-and-play framework that turns any MLLM into a two-stage retriever. FreeRet first derives semantically grounded embeddings directly from the model for fast candidate search, and then exploits its reasoning ability for precise reranking. The framework contributes three advances: bypassing lexical alignment layers to obtain semantically faithful embeddings, conditioning representation generation with explicit priors, and mitigating framing effect in reranking via neutral choice framing. On the MMEB and MMEB-V2 benchmarks spanning 46 datasets, FreeRet substantially outperforms models trained on millions of pairs. Beyond benchmarks, FreeRet is model-agnostic
Authors
(none)
Tags
Stats
Related papers
- RETLLM: Training And Data-free Mllms For Multimodal Information Retrieval (2026)1.57
- Mm-embed: Universal Multimodal Retrieval With Multimodal Llms (2024)0.00
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Lamra: Large Multimodal Model As Your Advanced Retrieval Assistant (2024)7.50
- SLQ: Bridging Modalities Via Shared Latent Queries For Retrieval With Frozen Mllms (2026)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Lightretriever: A Llm-based Text Retrieval Architecture With Extremely Faster Query Inference (2025)0.00
- Generative Giants, Retrieval Weaklings: Why Do Multimodal Large Language Models Fail At Multimodal Retrieval? (2025)0.00