Pairing Clustered Inverted Indexes With Knn Graphs For Fast Approximate Retrieval Over Learned Sparse Representations
2024 Β· Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, et al.
Abstract
Learned sparse representations form an effective and interpretable class of embeddings for text retrieval. While exact top-k retrieval over such embeddings faces efficiency challenges, a recent algorithm called Seismic has enabled remarkably fast, highly-accurate approximate retrieval. Seismic statically prunes inverted lists, organizes each list into geometrically-cohesive blocks, and augments each block with a summary vector. At query time, each inverted list associated with a query term is traversed one block at a time in an arbitrary order, with the inner product between the query and summaries determining if a block must be evaluated. When a block is deemed promising, its documents are fully evaluated with a forward index. Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and significantly outperforms the winning graph-based submissions to the BigANN 2023 Challenge. In this work, we speed up Seismic further by introducing two inn
Authors
(none)
Tags
Stats
Related papers
- Efficient Inverted Indexes For Approximate Retrieval Over Learned Sparse Representations (2024)11.67
- Investigating The Scalability Of Approximate Sparse Retrieval Algorithms To Massive Datasets (2025)5.84
- LIST: Learning To Index Spatio-textual Data For Embedding Based Spatial Keyword Queries (2024)0.00
- Hybrid Inverted Index Is A Robust Accelerator For Dense Retrieval (2022)7.07
- Efficient And Effective Retrieval Of Dense-sparse Hybrid Vectors Using Graph-based Approximate Nearest Neighbor Search (2024)0.00
- Adaptive Retrieval And Scalable Indexing For K-nn Search With Cross-encoders (2024)0.00
- SEINE: Segment-based Indexing For Neural Information Retrieval (2023)0.00
- Lucene For Approximate Nearest-neighbors Search On Arbitrary Dense Vectors (2019)0.00