SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search
2021 Β· Qi Chen, Bing Zhao, Haidong Wang, et al.
Abstract
The in-memory algorithms for approximate nearest neighbor search (ANNS) have achieved great success for fast high-recall search, but are extremely expensive when handling very large scale database. Thus, there is an increasing request for the hybrid ANNS solutions with small memory and inexpensive solid-state drive (SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology. It stores the centroid points of the posting lists in the memory and the large posting lists in the disk. We guarantee both disk-access efficiency (low latency) and high recall by effectively reducing the disk-access number and retrieving high-quality posting lists. In the index-building stage, we adopt a hierarchical balanced clustering algorithm to balance the length of posting lists and augment the posting list by adding the points in the closure of the corresponding clusters. In the search stage, we use a query-a
Authors
(none)
Tags
Stats
Related papers
- Aisaq: All-in-storage ANNS With Product Quantization For Dram-free Information Retrieval (2024)0.00
- A Memory-efficient Distributed Algorithm For Approximate Nearest Neighbour Search With Arbitrary Distances (2024)0.00
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph (2017)18.20
- Fusionanns: An Efficient CPU/GPU Cooperative Processing Architecture For Billion-scale Approximate Nearest Neighbor Search (2024)0.00
- Freshdiskann: A Fast And Accurate Graph-based ANN Index For Streaming Similarity Search (2021)0.00
- Diskann++: Efficient Page-based Search Over Isomorphic Mapped Graph Index Using Query-sensitivity Entry Vertex (2023)0.00
- DGAI: Decoupled On-disk Graph-based ANN Index For Efficient Updates And Queries (2025)0.00
- Ood-diskann: Efficient And Scalable Graph ANNS For Out-of-distribution Queries (2022)0.00