Let Them Have CAKES: A Cutting-edge Algorithm For Scalable, Efficient, And Exact Search On Big Data
2023 Β· Morgan E. Prior, Thomas J. Howard, Oliver McLaughlin, et al.
Abstract
The ongoing Big Data explosion has created a demand for efficient and scalable algorithms for similarity search. Most recent work has focused on \textit\{approximate\} \(k\)-NN search, and while this may be sufficient for some applications, \textit\{exact\} \(k\)-NN search would be ideal for many applications. We present CAKES, a set of three novel, exact algorithms for \(k\)-NN search. CAKES's algorithms are generic over \textit\{any\} distance function, and they \textit\{do not\} scale with the cardinality or embedding dimension of the dataset, but rather with its metric entropy and fractal dimension. We test these claims on datasets from the ANN-Benchmarks suite under commonly-used distance functions, as well as on a genomic dataset with Levenshtein distance and a radio-frequency dataset with Dynamic Time Warping distance. We demonstrate that CAKES exhibits near-constant scaling with cardinality on data conforming to the manifold hypothesis, and has perfect recall on data in \text
Authors
(none)
Tags
Stats
Related papers
- Hd-index: Pushing The Scalability-accuracy Boundary For Approximate Knn Search In High-dimensional Spaces (2018)14.02
- High-dimensional Approximate Nearest Neighbor Search: With Reliable And Efficient Distance Comparison Operations (2023)13.44
- Efficient Data-aware Distance Comparison Operations For High-dimensional Approximate Nearest Neighbor Search (2024)5.24
- Unconventional Application Of K-means For Distributed Approximate Similarity Search (2022)5.84
- Scalable K-means Clustering For Large K Via Seeded Approximate Nearest-neighbor Search (2025)0.00
- Effective And General Distance Computation For Approximate Nearest Neighbor Search (2024)5.84
- CANDY: A Benchmark For Continuous Approximate Nearest Neighbor Search With Dynamic Data Ingestion (2024)0.00
- A Memory-efficient Distributed Algorithm For Approximate Nearest Neighbour Search With Arbitrary Distances (2024)0.00