Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores
2025 Β· Guohang Yan, Yue Zhang, Pinlong Cai, et al.
Abstract
Retrieval-augmented generation (RAG) has become a dominant paradigm for mitigating knowledge hallucination and staleness in large language models (LLMs) while preserving data security. By retrieving relevant evidence from private, domain-specific corpora and injecting it into carefully engineered prompts, RAG delivers trustworthy responses without the prohibitive cost of fine-tuning. Traditional retrieval-augmented generation (RAG) systems are text-only and often rely on a single storage backend, most commonly a vector database. In practice, this monolithic design suffers from unavoidable trade-offs: vector search captures semantic similarity yet loses global context; knowledge graphs excel at relational precision but struggle with recall; full-text indexes are fast and exact yet semantically blind; and relational engines such as MySQL provide strong transactional guarantees but no semantic understanding. We argue that these heterogeneous retrieval paradigms are complementary, and prop
Authors
(none)
Tags
Stats
Related papers
- Advancing Retrieval-augmented Generation For Structured Enterprise And Internal Data (2025)1.20
- Universalrag: Retrieval-augmented Generation Over Corpora Of Diverse Modalities And Granularities (2025)0.00
- Neurosymbolic Retrievers For Retrieval-augmented Generation (2026)0.00
- Ragdb: A Zero-dependency, Embeddable Architecture For Multimodal Retrieval-augmented Generation On The Edge (2025)0.00
- Erarag: Efficient And Incremental Retrieval Augmented Generation For Growing Corpora (2025)4.51
- Rag-check: Evaluating Multimodal Retrieval Augmented Generation Performance (2025)0.00
- Multimodal RAG For Unstructured Data:leveraging Modality-aware Knowledge Graphs With Hybrid Retrieval (2025)0.00
- HASH-RAG: Bridging Deep Hashing With Retriever For Efficient, Fine Retrieval And Augmented Generation (2025)0.00