MG\(^2\)-RAG: Multi-granularity Graph For Multimodal Retrieval-augmented Generation
2026 Β· Sijun Dai, Qiang Huang, Xiaoxing You, et al.
Abstract
Retrieval-Augmented Generation (RAG) mitigates hallucinations in Multimodal Large Language Models (MLLMs), yet existing systems struggle with complex cross-modal reasoning. Flat vector retrieval often ignores structural dependencies, while current graph-based methods rely on costly ``translation-to-text'' pipelines that discard fine-grained visual information. To address these limitations, we propose \textbf\{MG\(^2\)-RAG\}, a lightweight \textbf\{M\}ulti-\textbf\{G\}ranularity \textbf\{G\}raph \textbf\{RAG\} framework that jointly improves graph construction, modality fusion, and cross-modal retrieval. MG\(^2\)-RAG constructs a hierarchical multimodal knowledge graph by combining lightweight textual parsing with entity-driven visual grounding, enabling textual entities and visual regions to be fused into unified multimodal nodes that preserve atomic evidence. Building on this representation, we introduce a multi-granularity graph retrieval mechanism that aggregates dense similarities
Authors
(none)
Tags
Stats
Related papers
- OMGM: Orchestrate Multiple Granularities And Modalities For Efficient Multimodal Retrieval (2025)0.00
- Multimodal RAG For Unstructured Data:leveraging Modality-aware Knowledge Graphs With Hybrid Retrieval (2025)0.00
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Universalrag: Retrieval-augmented Generation Over Corpora Of Diverse Modalities And Granularities (2025)0.00
- Cross-modal RAG: Sub-dimensional Text-to-image Retrieval-augmented Generation (2025)0.00
- Regionrag: Region-level Retrieval-augmented Generation For Visual Document Understanding (2025)0.00
- Rag-check: Evaluating Multimodal Retrieval Augmented Generation Performance (2025)0.00
- Erarag: Efficient And Incremental Retrieval Augmented Generation For Growing Corpora (2025)4.51