Govector: An I/o-efficient Caching Strategy For High-dimensional Vector Nearest Neighbor Search | Awesome Similarity Search Papers

Govector: An I/o-efficient Caching Strategy For High-dimensional Vector Nearest Neighbor Search

Yijie Zhou, Shengyuan Lin, Shufeng Gong, Song Yu, Shuhao Fan, Yanfeng Zhang, Ge Yu Β· International Journal of Software and Informatics Β· 2025

Graph-based high-dimensional vector indices have become a mainstream solution for large-scale approximate nearest neighbor search (ANNS). However, their substantial memory footprint often requires storage on secondary devices, where frequent on-demand loading of graph and vector data leads to I/O becoming the dominant bottleneck, accounting for over 90% of query latency. Existing static caching strategies mitigate this issue only in the initial navigation phase by preloading entry points and multi-hop neighbors, but they fail in the second phase where query-dependent nodes must be dynamically accessed to achieve high recall. We propose GoVector, an I/O-efficient caching strategy tailored for disk-based graph indices. GoVector combines (1) a static cache that stores entry points and frequently accessed neighbors, and (2) a dynamic cache that adaptively captures nodes with high spatial locality during the second search phase. To further align storage layout with similarity-driven search patterns, GoVector reorders nodes on disk so that similar vectors are colocated on the same or adjacent pages, thereby improving locality and reducing I/O overhead. Extensive experiments on multiple public datasets show that GoVector achieves substantial performance improvements. At 90% recall, it reduces I/O operations by 46% on average, increases query throughput by 1.73x, and lowers query latency by 42% compared to state-of-the-art disk-based graph indexing systems.

Explore more on:
ANN Search
Similar Work
Loading…