Mr\(^2\)-bench: Going Beyond Matching To Reasoning In Multimodal Retrieval

Abstract

Multimodal retrieval is becoming a crucial component of modern AI applications, yet its evaluation lags behind the demands of more realistic and challenging scenarios. Existing benchmarks primarily probe surface-level semantic correspondence (e.g., object-text matching) while failing to assess the deeper reasoning required to capture complex relationships between visual and textual information. To address this gap, we introduce MR\(^2\)-Bench, a reasoning-intensive benchmark for multimodal retrieval. MR\(^2\)-Bench presents the following critical values: 1) all tasks are reasoning-driven, going beyond shallow matching to effectively assess models' capacity for logical, spatial, and causal inference; 2) it features diverse multimodal data, such as natural images, diagrams, and visual puzzles, enabling comprehensive evaluation across content types; 3) it supports complex queries and documents containing multiple images and covers diverse retrieval scenarios, more accurately reflecting re

Mr\(^2\)-bench: Going Beyond Matching To Reasoning In Multimodal Retrieval

Abstract

Authors

Tags

Stats

Related papers