FLASH: Randomized Algorithms Accelerated Over CPU-GPU For Ultra-high Dimensional Similarity Search
2017 Β· Yiqiu Wang, Anshumali Shrivastava, Jonathan Wang, et al.
Abstract
We present FLASH (\textbf\{F\}ast \textbf\{L\}SH \textbf\{A\}lgorithm for \textbf\{S\}imilarity search accelerated with \textbf\{H\}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the prese
Authors
(none)
Tags
Stats
Related papers
- Adaptive Prefiltering For High-dimensional Similarity Search: A Frequency-aware Approach (2025)0.00
- Billion-scale Similarity Search With Gpus (2017)24.96
- Distributed Tera-scale Similarity Search With MPI: Provably Efficient Similarity Search Over Billions Without A Single Distance Computation (2020)0.00
- Qwlsh: Cache-conscious Indexing For Processing Similarity Search Query Workloads In High-dimensional Spaces (2019)4.52
- Improving Similarity Search With High-dimensional Locality-sensitive Hashing (2018)0.00
- Billion-scale Similarity Search Using A Hybrid Indexing Approach With Advanced Filtering (2025)4.52
- Improving Locality Sensitive Hashing By Efficiently Finding Projected Nearest Neighbors (2020)6.77
- Practical Hash Functions For Similarity Estimation And Dimensionality Reduction (2017)0.00