Edgerag: Online-indexed RAG For Edge Devices
2024 Β· Korakit Seemakhupt, Sihang Liu, Samira Khan
Abstract
Deploying Retrieval Augmented Generation (RAG) on resource-constrained edge devices is challenging due to limited memory and processing power. In this work, we propose EdgeRAG which addresses the memory constraint by pruning embeddings within clusters and generating embeddings on-demand during retrieval. To avoid the latency of generating embeddings for large tail clusters, EdgeRAG pre-computes and stores embeddings for these clusters, while adaptively caching remaining embeddings to minimize redundant computations and further optimize latency. The result from BEIR suite shows that EdgeRAG offers significant latency reduction over the baseline IVF index, but with similar generation quality while allowing all of our evaluated datasets to fit into the memory.
Authors
(none)
Tags
Stats
Related papers
- Ragdb: A Zero-dependency, Embeddable Architecture For Multimodal Retrieval-augmented Generation On The Edge (2025)0.00
- Cimrag: Cim-aware Domain-adaptive And Noise-resilient Retrieval-augmented Generation For Edge-based Llms (2026)0.00
- Xrag: Extreme Context Compression For Retrieval-augmented Generation With One Token (2024)7.81
- Slimrag: Retrieval Without Graphs Via Entity-aware Context Selection (2025)1.91
- RAG Without Forgetting: Continual Query-infused Key Memory (2026)0.00
- Frustratingly Simple Retrieval Improves Challenging, Reasoning-intensive Benchmarks (2025)0.00
- Optimizing Retrieval For RAG Via Reinforcement Learning (2025)0.00
- Erarag: Efficient And Incremental Retrieval Augmented Generation For Growing Corpora (2025)4.51