Gleanvec: Accelerating Vector Search With Minimalist Nonlinear Dimensionality Reduction
2024 Β· Mariano Tepper, Ishwar Singh Bhati, Cecilia Aguerrebere, et al.
Abstract
Embedding models can generate high-dimensional vectors whose similarity reflects semantic affinities. Thus, accurately and timely retrieving those vectors in a large collection that are similar to a given query has become a critical component of a wide range of applications. In particular, cross-modal retrieval (e.g., where a text query is used to find images) is gaining momentum rapidly. Here, it is challenging to achieve high accuracy as the queries often have different statistical distributions than the database vectors. Moreover, the high vector dimensionality puts these search systems under compute and memory pressure, leading to subpar performance. In this work, we present new linear and nonlinear methods for dimensionality reduction to accelerate high-dimensional vector search while maintaining accuracy in settings with in-distribution (ID) and out-of-distribution (OOD) queries. The linear LeanVec-Sphering outperforms other linear methods, trains faster, comes with no hyperparam
Authors
(none)
Tags
Stats
Related papers
- Leanvec: Searching Vectors Faster By Making Them Fit (2023)0.00
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Breaking The Curse Of Dimensionality: On The Stability Of Modern Vector Retrieval (2025)0.00
- Semantic Vector Encoding And Similarity Search Using Fulltext Search Engines (2017)6.77
- Lucene For Approximate Nearest-neighbors Search On Arbitrary Dense Vectors (2019)0.00
- Reveal Hidden Pitfalls And Navigate Next Generation Of Vector Similarity Search From Task-centric Views (2025)0.00
- On Strengths And Limitations Of Single-vector Embeddings (2026)0.00
- Dimensionality-reduction Techniques For Approximate Nearest Neighbor Search: A Survey And Evaluation (2024)0.00