Multi-head RAG: Solving Multi-aspect Problems With Llms
2024 Β· MacIej Besta, Ales Kubicek, Robert Gerstenberger, et al.
Abstract
Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by retrieving supporting documents into the prompt, but existing methods do not explicitly target queries that require fetching multiple documents with substantially different content. Such multi-aspect queries are challenging because relevant documents can be far apart in embedding space, making joint retrieval difficult. We introduce Multi-Head RAG (MRAG), which addresses this gap with a simple yet powerful idea: using Transformer multi-head attention activations rather than the standard decoder-layer embedding, as retrieval keys. It leverages the observation that different heads capture different semantic aspects. This yields multi-aspect embeddings for both documents and queries, improving retrieval accuracy on complex queries. We show MRAG's design advantages over 18 RAG baselines, up to 20% higher retrieval success ratios for real-world use cases, and improved downstream LLM generation. MRAG integrates sea
Authors
(none)
Tags
Stats
Related papers
- MG\(^2\)-RAG: Multi-granularity Graph For Multimodal Retrieval-augmented Generation (2026)0.00
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Regionrag: Region-level Retrieval-augmented Generation For Visual Document Understanding (2025)0.00
- Domain-aware RAG: Mol-enhanced RL For Efficient Training And Scalable Retrieval (2025)0.00
- Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025)0.00
- SV-RAG: Lora-contextualizing Adaptation Of Mllms For Long Document Understanding (2024)0.00
- Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores (2025)3.27
- SRAG: RAG With Structured Data Improves Vector Retrieval (2026)0.00