Abstract

The ongoing Big Data explosion has created a demand for efficient and scalable algorithms for similarity search. Most recent work has focused on \textit\{approximate\} \(k\)-NN search, and while this may be sufficient for some applications, \textit\{exact\} \(k\)-NN search would be ideal for many applications. We present CAKES, a set of three novel, exact algorithms for \(k\)-NN search. CAKES's algorithms are generic over \textit\{any\} distance function, and they \textit\{do not\} scale with the cardinality or embedding dimension of the dataset, but rather with its metric entropy and fractal dimension. We test these claims on datasets from the ANN-Benchmarks suite under commonly-used distance functions, as well as on a genomic dataset with Levenshtein distance and a radio-frequency dataset with Dynamic Time Warping distance. We demonstrate that CAKES exhibits near-constant scaling with cardinality on data conforming to the manifold hypothesis, and has perfect recall on data in \text

Authors

(none)

Tags

  • ANN Search

Stats

  • citations0
  • S2 citationsβ€”
  • github stars21
  • HF likes0
  • heat score2.68
  • arxiv keyprior2023let

Related papers