PCA-RAG: Principal Component Analysis For Efficient Retrieval-augmented Generation

·2025

arXiv:khaledian2025pca ↗Google Scholar ↗Semantic Scholar ↗

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for grounding large language models in external knowledge sources, improving the precision of agents responses. However, high-dimensional language model embeddings, often in the range of hundreds to thousands of dimensions, can present scalability challenges in terms of storage and latency, especially when processing massive financial text corpora. This paper investigates the use of Principal Component Analysis (PCA) to reduce embedding dimensionality, thereby mitigating computational bottlenecks without incurring large accuracy losses. We experiment with a real-world dataset and compare different similarity and distance metrics under both full-dimensional and PCA-compressed embeddings. Our results show that reducing vectors from 3,072 to 110 dimensions provides a sizeable (up to \(60\times\)) speedup in retrieval operations and a \(\sim 28.6\times\) reduction in index size, with only moderate declines in correlati

Abstract

Related papers