Distributed Tera-scale Similarity Search With MPI: Provably Efficient Similarity Search Over Billions Without A Single Distance Computation
2020 Β· Nicholas Meisburger, Anshumali Shrivastava
Abstract
We present SLASH (Sketched LocAlity Sensitive Hashing), an MPI (Message Passing Interface) based distributed system for approximate similarity search over terabyte scale datasets. SLASH provides a multi-node implementation of the popular LSH (locality sensitive hashing) algorithm, which is generally implemented on a single machine. We show how we can append the LSH algorithm with heavy hitters sketches to provably solve the (high) similarity search problem without a single distance computation. Overall, we mathematically show that, under realistic data assumptions, we can identify the near-neighbor of a given query \(q\) in sub-linear (\( \ll O(n)\)) number of simple sketch aggregation operations only. To make such a system practical, we offer a novel design and sketching solution to reduce the inter-machine communication overheads exponentially. In a direct comparison on comparable hardware, SLASH is more than 10000x faster than the popular LSH package in PySpark. PySpark is a widely-
Authors
(none)
Tags
Stats
Related papers
- FLASH: Randomized Algorithms Accelerated Over CPU-GPU For Ultra-high Dimensional Similarity Search (2017)9.23
- Improving Similarity Search With High-dimensional Locality-sensitive Hashing (2018)0.00
- SQUASH: Serverless And Distributed Quantization-based Attributed Vector Similarity Search (2025)0.00
- Improving Locality Sensitive Hashing By Efficiently Finding Projected Nearest Neighbors (2020)6.77
- Locality Sensitive Hashing For Set-queries, Motivated By Group Recommendations (2020)0.00
- Qwlsh: Cache-conscious Indexing For Processing Similarity Search Query Workloads In High-dimensional Spaces (2019)4.52
- Subsets And Supermajorities: Optimal Hashing-based Set Similarity Search (2019)5.84
- LOH And Behold: Web-scale Visual Search, Recommendation And Clustering Using Locally Optimized Hashing (2016)4.52