Hybrid Inverted Index Is A Robust Accelerator For Dense Retrieval
2022 Β· Peitian Zhang, Zheng Liu, Shitao Xiao, et al.
Abstract
Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by subsequent codecs, thus avoiding the expensive cost of exhaustive traversal. However, the clustering is always lossy, which results in the miss of relevant documents in the probed clusters and hence degrades retrieval quality. In contrast, lexical matching, such as overlaps of salient terms, tends to be strong feature for identifying relevant documents. In this work, we present the Hybrid Inverted Index (HI\(^2\)), where the embedding clusters and salient terms work collaboratively to accelerate dense retrieval. To make best of both effectiveness and efficiency, we devise a cluster selector and a term selector, to construct compact inverted lists and efficiently searching through them. Moreover, we leverage simple unsupervised algorithms as well as en
Authors
(none)
Tags
Stats
Related papers
- EHI: End-to-end Learning Of Hierarchical Index For Efficient Dense Retrieval (2023)0.00
- SLIM: Sparsified Late Interaction For Multi-vector Retrieval With Inverted Indexes (2023)7.50
- Efficient Inverted Indexes For Approximate Retrieval Over Learned Sparse Representations (2024)11.67
- Operational Advice For Dense And Sparse Retrievers: HNSW, Flat, Or Inverted Indexes? (2024)0.00
- Deeperimpact: Optimizing Sparse Learned Index Structures (2024)0.00
- Billion-scale Similarity Search Using A Hybrid Indexing Approach With Advanced Filtering (2025)4.52
- All-in-one Graph-based Indexing For Hybrid Search On Gpus (2025)0.00
- Pairing Clustered Inverted Indexes With Knn Graphs For Fast Approximate Retrieval Over Learned Sparse Representations (2024)7.50