NUDGE: Lightweight Non-parametric Fine-tuning Of Embeddings For Retrieval
2024 Β· Sepanta Zeighami, Zac Wellmer, Aditya Parameswaran
Abstract
\(k\)-Nearest Neighbor search on dense vector embeddings (\(k\)-NN retrieval) from pre-trained embedding models is the predominant retrieval method for text and images, as well as Retrieval-Augmented Generation (RAG) pipelines. In practice, application developers often fine-tune the embeddings to improve their accuracy on the dataset and query workload in hand. Existing approaches either fine-tune the pre-trained model itself or, more efficiently, but at the cost of accuracy, train adaptor models to transform the output of the pre-trained model. We present NUDGE, a family of novel non-parametric embedding fine-tuning approaches that are significantly more accurate and efficient than both sets of existing approaches. NUDGE directly modifies the embeddings of data records to maximize the accuracy of \(k\)-NN retrieval. We present a thorough theoretical and experimental study of NUDGE's non-parametric approach. We show that even though the underlying problem is NP-Hard, constrained variat
Authors
(none)
Tags
Stats
Related papers
- Knn-embed: Locally Smoothed Embedding Mixtures For Multi-interest Candidate Retrieval (2022)3.58
- Efficient K-nn Search With Cross-encoders Using Adaptive Multi-round CUR Decomposition (2023)0.00
- REFINE On Scarce Data: Retrieval Enhancement Through Fine-tuning Via Model Fusion Of Embedding Models (2024)3.58
- Adaptive Retrieval And Scalable Indexing For K-nn Search With Cross-encoders (2024)0.00
- You Can't Pick Your Neighbors, Or Can You? When And How To Rely On Retrieval In The \(k\)nn-lm (2022)5.24
- Lucene For Approximate Nearest-neighbors Search On Arbitrary Dense Vectors (2019)0.00
- Exploiting Pre-trained Models For Drug Target Affinity Prediction With Nearest Neighbors (2024)3.58
- More Robust Dense Retrieval With Contrastive Dual Learning (2021)11.88