Xrag: Extreme Context Compression For Retrieval-augmented Generation With One Token
2024 Β· Xin Cheng, Xun Wang, Xingxing Zhang, et al.
Abstract
This paper introduces xRAG, an innovative context compression method tailored for retrieval-augmented generation. xRAG reinterprets document embeddings in dense retrieval--traditionally used solely for retrieval--as features from the retrieval modality. By employing a modality fusion methodology, xRAG seamlessly integrates these embeddings into the language model representation space, effectively eliminating the need for their textual counterparts and achieving an extreme compression rate. In xRAG, the only trainable component is the modality bridge, while both the retriever and the language model remain frozen. This design choice allows for the reuse of offline-constructed document embeddings and preserves the plug-and-play nature of retrieval augmentation. Experimental results demonstrate that xRAG achieves an average improvement of over 10% across six knowledge-intensive tasks, adaptable to various language model backbones, ranging from a dense 7B model to an 8x7B Mixture of Experts
Authors
(none)
Tags
Stats
Related papers
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Ragdb: A Zero-dependency, Embeddable Architecture For Multimodal Retrieval-augmented Generation On The Edge (2025)0.00
- Slimrag: Retrieval Without Graphs Via Entity-aware Context Selection (2025)1.91
- Universalrag: Retrieval-augmented Generation Over Corpora Of Diverse Modalities And Granularities (2025)0.00
- PCA-RAG: Principal Component Analysis For Efficient Retrieval-augmented Generation (2025)0.00
- Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores (2025)3.27
- Cross-modal RAG: Sub-dimensional Text-to-image Retrieval-augmented Generation (2025)0.00
- Optimizing Retrieval-augmented Generation: Analysis Of Hyperparameter Impact On Performance And Efficiency (2025)0.00