Advancing Retrieval-augmented Generation For Structured Enterprise And Internal Data
2025 Β· Chandana Cheerla
Abstract
Organizations increasingly rely on proprietary enterprise data, including HR records, structured reports, and tabular documents, for critical decision-making. While Large Language Models (LLMs) have strong generative capabilities, they are limited by static pretraining, short context windows, and challenges in processing heterogeneous data formats. Conventional Retrieval-Augmented Generation (RAG) frameworks address some of these gaps but often struggle with structured and semi-structured data. This work proposes an advanced RAG framework that combines hybrid retrieval strategies using dense embeddings (all-mpnet-base-v2) and BM25, enhanced by metadata-aware filtering with SpaCy NER and cross-encoder reranking. The framework applies semantic chunking to maintain textual coherence and retains tabular data structures to preserve row-column integrity. Quantized indexing optimizes retrieval efficiency, while human-in-the-loop feedback and conversation memory improve adaptability. Exper
Authors
(none)
Tags
Stats
Related papers
- Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores (2025)3.27
- Erarag: Efficient And Incremental Retrieval Augmented Generation For Growing Corpora (2025)4.51
- Ragdb: A Zero-dependency, Embeddable Architecture For Multimodal Retrieval-augmented Generation On The Edge (2025)0.00
- SRAG: RAG With Structured Data Improves Vector Retrieval (2026)0.00
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Funnelrag: A Coarse-to-fine Progressive Retrieval Paradigm For RAG (2024)3.58
- Multimodal RAG For Unstructured Data:leveraging Modality-aware Knowledge Graphs With Hybrid Retrieval (2025)0.00