MIRAGE: Runtime Scheduling For Multi-vector Image Retrieval With Hierarchical Decomposition
2025 Β· Maoliang Li, Ke Li, Yaoyang Liu, et al.
Abstract
To effectively leverage user-specific data, retrieval augmented generation (RAG) is employed in multimodal large language model (MLLM) applications. However, conventional retrieval approaches often suffer from limited retrieval accuracy. Recent advances in multi-vector retrieval (MVR) improve accuracy by decomposing queries and matching against segmented images. They still suffer from sub-optimal accuracy and efficiency, overlooking alignment between the query and varying image objects and redundant fine-grained image segments. In this work, we present an efficient scheduling framework for image retrieval - MIRAGE. First, we introduce a novel hierarchical paradigm, employing multiple intermediate granularities for varying image objects to enhance alignment. Second, we minimize redundancy in retrieval by leveraging cross-hierarchy similarity consistency and hierarchy sparsity to minimize unnecessary matching computation. Furthermore, we configure parameters for each dataset automaticall
Authors
(none)
Tags
Stats
Related papers
- MURE: Hierarchical Multi-resolution Encoding Via Vision-language Models For Visual Document Retrieval (2026)0.00
- HMAR: Hierarchical Modality-aware Expert And Dynamic Routing Medical Image Retrieval Architecture (2026)0.00
- MG\(^2\)-RAG: Multi-granularity Graph For Multimodal Retrieval-augmented Generation (2026)0.00
- Hierarchical Matching And Reasoning For Multi-query Image Retrieval (2023)8.70
- OMGM: Orchestrate Multiple Granularities And Modalities For Efficient Multimodal Retrieval (2025)0.00
- Indexing Multimodal Language Models For Large-scale Image Retrieval (2026)0.00
- Retrieval-augmented Perception: High-resolution Image Perception Meets Visual RAG (2025)0.00
- Mcot-mvs: Multi-level Vision Selection By Multi-modal Chain-of-thought Reasoning For Composed Image Retrieval (2026)0.00