MARVEL: Multimodal Adaptive Reasoning-intensive Expand-rerank And Retrieval
2026 Β· Mahmoud Salaheldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, et al.
Abstract
Multimodal retrieval over text corpora remains a fundamental challenge: the best vision-language encoder achieves only 27.6 nDCG@10 on MM-BRIGHT, a reasoning-intensive multimodal retrieval benchmark, underperforming strong text-only systems. We argue that effective multimodal retrieval requires three tightly integrated capabilities that existing approaches address only in isolation: expanding the query's latent intent, retrieving with a model trained for complex reasoning, and reranking via explicit step-by-step reasoning over candidates. We introduce \textbf\{MARVEL\} (\textbf\{M\}ultimodal \textbf\{A\}daptive \textbf\{R\}easoning-intensi\textbf\{V\}e \textbf\{E\}xpand-rerank and retrieva\textbf\{L\}), a unified pipeline that combines LLM-driven query expansion, \textbf\{MARVEL-Retriever\} -- a reasoning-enhanced dense retriever fine-tuned for complex multimodal queries -- and GPT-4o-based chain-of-thought reranking with optional multi-pass reciprocal rank fusion. Evaluated on MM-BRIG
Authors
(none)
Tags
Stats
Related papers
- MM-BRIGHT: A Multi-task Multimodal Benchmark For Reasoning-intensive Retrieval (2026)2.60
- Reasoning-augmented Representations For Multimodal Retrieval (2026)0.00
- Reason To Contrast: A Cascaded Multimodal Retrieval Framework (2025)0.00
- V-retrver: Evidence-driven Agentic Reasoning For Universal Multimodal Retrieval (2026)0.00
- TRACE: Task-adaptive Reasoning And Representation Learning For Universal Multimodal Retrieval (2026)0.00
- MRMR: A Realistic And Expert-level Multidisciplinary Benchmark For Reasoning-intensive Multimodal Retrieval (2025)0.00
- Mr\(^2\)-bench: Going Beyond Matching To Reasoning In Multimodal Retrieval (2025)1.81
- MARVEL: Unlocking The Multi-modal Capability Of Dense Retrieval Via Visual Module Plugin (2023)9.04