Reasoning Guided Embeddings: Leveraging MLLM Reasoning For Improved Multimodal Retrieval
2025 Β· Chunxu Liu, Jiyuan Yang, Ruopeng Gao, et al.
Abstract
Multimodal embeddings are widely used in downstream tasks such as multimodal retrieval, enabling alignment of interleaved modalities in a shared representation space. While recent studies show that Multimodal Large Language Models (MLLMs) can serve as strong embedding extractors, existing approaches treat embedding extraction as a direct encoding step, overlooking the fact that MLLMs possess the generative capability for reasoning that could be leveraged to enhance representation quality. In this work, we explore how to explicitly incorporate reasoning into the embedding process. To this end, we propose Reasoning Guided Embeddings (RGE), which preserves the generative rationale process of MLLMs and couples it with contrastive training. Our method first enables the model to perform structured rationale generation conditioned on the instruction, and then extracts representations after reasoning has unfolded. This simple design enhances the context-conditional inference signals within the
Authors
(none)
Tags
Stats
Related papers
- Embed-rl: Reinforcement Learning For Reasoning-driven Multimodal Embeddings (2026)0.00
- Reasoning-augmented Representations For Multimodal Retrieval (2026)0.00
- TRACE: Task-adaptive Reasoning And Representation Learning For Universal Multimodal Retrieval (2026)0.00
- V-retrver: Evidence-driven Agentic Reasoning For Universal Multimodal Retrieval (2026)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Reason To Contrast: A Cascaded Multimodal Retrieval Framework (2025)0.00
- PLUME: Latent Reasoning Based Universal Multimodal Embedding (2026)0.00
- MARVEL: Multimodal Adaptive Reasoning-intensive Expand-rerank And Retrieval (2026)0.00