Index-based, High-dimensional, Cosine Threshold Querying With Optimality Guarantees
2018 Β· Yuliang Li, Jianguo Wang, Benjamin Pullman, et al.
Abstract
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold \{\theta\}. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin's well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified for \{\theta\}-similarity. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be
Authors
(none)
Tags
Stats
Related papers
- Fast Cosine Similarity Search In Binary Space With Angular Multi-index Hashing (2016)8.60
- Breaking The Curse Of Dimensionality: On The Stability Of Modern Vector Retrieval (2025)0.00
- Reveal Hidden Pitfalls And Navigate Next Generation Of Vector Similarity Search From Task-centric Views (2025)0.00
- Cos-mix: Cosine Similarity And Distance Fusion For Improved Information Retrieval (2024)0.00
- Adaptive Prefiltering For High-dimensional Similarity Search: A Frequency-aware Approach (2025)0.00
- Fast Top-k Cosine Similarity Search Through Xor-friendly Binary Quantization On Gpus (2020)0.00
- Probabilistic Kernel Function For Fast Angle Testing (2025)0.00
- Optimization Of Latent-space Compression Using Game-theoretic Techniques For Transformer-based Vector Search (2025)0.00