Abstract

Given a vector dataset \(\mathcal\{X\}\) and a query vector \(\vec\{x\}_q\), graph-based Approximate Nearest Neighbor Search (ANNS) aims to build a graph index \(G\) and approximately return vectors with minimum distances to \(\vec\{x\}_q\) by searching over \(G\). The main drawback of graph-based ANNS is that a graph index would be too large to fit into the memory especially for a large-scale \(\mathcal\{X\}\). To solve this, a Product Quantization (PQ)-based hybrid method called DiskANN is proposed to store a low-dimensional PQ index in memory and retain a graph index in SSD, thus reducing memory overhead while ensuring a high search accuracy. However, it suffers from two I/O issues that significantly affect the overall efficiency: (1) long routing path from an entry vertex to the query's neighborhood that results in large number of I/O requests and (2) redundant I/O requests during the routing process. We propose an optimized DiskANN++ to overcome above issues. Specifically, for the

Authors

(none)

Tags

  • ANN Search
  • Product Quantization

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyni2023diskann

Related papers