MIRACL-VISION: A Large, Multilingual, Visual Document Retrieval Benchmark
2025 Β· Radek Osmulski, Gabriel de Souza P. Moreira, Ronay Ak, et al.
Abstract
Document retrieval is an important task for search and Retrieval-Augmented Generation (RAG) applications. Large Language Models (LLMs) have contributed to improving the accuracy of text-based document retrieval. However, documents with complex layout and visual elements like tables, charts and infographics are not perfectly represented in textual format. Recently, image-based document retrieval pipelines have become popular, which use visual large language models (VLMs) to retrieve relevant page images given a query. Current evaluation benchmarks on visual document retrieval are limited, as they primarily focus only English language, rely on synthetically generated questions and offer a small corpus size. Therefore, we introduce MIRACL-VISION, a multilingual visual document retrieval evaluation benchmark. MIRACL-VISION covers 18 languages, and is an extension of the MIRACL dataset, a popular benchmark to evaluate text-based multilingual retrieval pipelines. MIRACL was built using a hum
Authors
(none)
Tags
Stats
Related papers
- Visr-bench: An Empirical Study On Visual Retrieval-augmented Generation For Multilingual Long Document Understanding (2025)0.00
- Docmmir: A Framework For Document Multi-modal Information Retrieval (2025)3.46
- Unlocking Multimodal Document Intelligence: From Current Triumphs To Future Frontiers Of Visual Document Retrieval (2026)0.00
- Colpali: Efficient Document Retrieval With Vision Language Models (2024)0.00
- Mrag-bench: Vision-centric Evaluation For Retrieval-augmented Multimodal Models (2024)0.00
- Modernvbert: Towards Smaller Visual Document Retrievers (2025)0.00
- Vision-deepresearch Benchmark: Rethinking Visual And Textual Search For Multimodal Large Language Models (2026)7.27
- IRPAPERS: A Visual Document Benchmark For Scientific Retrieval And Question Answering (2026)0.00