Breaking The Storage-compute Bottleneck In Billion-scale ANNS: A Gpu-driven Asynchronous I/O Framework
2025 Β· Yang Xiao, Mo Sun, Ziyu Song, et al.
Abstract
With the advancement of information retrieval, recommendation systems, and Retrieval-Augmented Generation (RAG), Approximate Nearest Neighbor Search (ANNS) gains widespread applications due to its higher performance and accuracy. While several disk-based ANNS systems have emerged to handle exponentially growing vector datasets, they suffer from suboptimal performance due to two inherent limitations: 1) failing to overlap SSD accesses with distance computation processes and 2) extended I/O latency caused by suboptimal I/O Stack. To address these challenges, we present FlashANNS, a GPU-accelerated out-of-core graph-based ANNS system through I/O-compute overlapping. Our core insight lies in the synchronized orchestration of I/O and computation through three key innovations: 1) Dependency-Relaxed asynchronous pipeline: FlashANNS decouples I/O-computation dependencies to fully overlap between GPU distance calculations and SSD data transfers. 2) Warp-Level concurrent SSD access: FlashANNS im
Authors
(none)
Tags
Stats
Related papers
- Fusionanns: An Efficient CPU/GPU Cooperative Processing Architecture For Billion-scale Approximate Nearest Neighbor Search (2024)0.00
- A Real-time Adaptive Multi-stream GPU System For Online Approximate Nearest Neighborhood Search (2024)5.84
- DGAI: Decoupled On-disk Graph-based ANN Index For Efficient Updates And Queries (2025)0.00
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search (2021)0.00
- CAGRA: Highly Parallel Graph Construction And Approximate Nearest Neighbor Search For Gpus (2023)12.17
- Ood-diskann: Efficient And Scalable Graph ANNS For Out-of-distribution Queries (2022)0.00
- Aisaq: All-in-storage ANNS With Product Quantization For Dram-free Information Retrieval (2024)0.00
- Freshdiskann: A Fast And Accurate Graph-based ANN Index For Streaming Similarity Search (2021)0.00