REAL-MM-RAG: A Real-world Multi-modal Retrieval Benchmark
2025 Β· Navve Wasserman, Roi Pony, Oshri Naparstek, et al.
Abstract
Accurate multi-modal document retrieval is crucial for Retrieval-Augmented Generation (RAG), yet existing benchmarks do not fully capture real-world challenges with their current design. We introduce REAL-MM-RAG, an automatically generated benchmark designed to address four key properties essential for real-world retrieval: (i) multi-modal documents, (ii) enhanced difficulty, (iii) Realistic-RAG queries and (iv) accurate labeling. Additionally, we propose a multi-difficulty-level scheme based on query rephrasing to evaluate models' semantic understanding beyond keyword matching. Our benchmark reveals significant model weaknesses, particularly in handling table-heavy documents and robustness to query rephrasing. To mitigate these shortcomings, we curate a rephrased training set and introduce a new finance-focused, table-heavy dataset. Fine-tuning on these datasets enables models to achieve state-of-the-art retrieval performance on REAL-MM-RAG benchmark. Our work offers a better way to e
Authors
(none)
Tags
Stats
Related papers
- Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025)0.00
- From BM25 To Corrective RAG: Benchmarking Retrieval Strategies For Text-and-table Documents (2026)0.00
- MRMR: A Realistic And Expert-level Multidisciplinary Benchmark For Reasoning-intensive Multimodal Retrieval (2025)0.00
- M3retrieve: Benchmarking Multimodal Retrieval For Medicine (2025)2.16
- CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark (2025)0.00
- Rag-check: Evaluating Multimodal Retrieval Augmented Generation Performance (2025)0.00
- Visual-rag: Benchmarking Text-to-image Retrieval Augmented Generation For Visual Knowledge Intensive Queries (2025)0.00
- MM-BRIGHT: A Multi-task Multimodal Benchmark For Reasoning-intensive Retrieval (2026)2.60