A Memory-efficient Distributed Algorithm For Approximate Nearest Neighbour Search With Arbitrary Distances
2024 Β· Elena Garcia-Morato, Maria Jesus Algar, Cesar Alfaro, et al.
Abstract
Approximate nearest neighbour (ANN) search has become a central task in modern data-intensive applications, particularly when operating on large, heterogeneous, or high-dimensional datasets. However, many existing ANN methods struggle in such scenarios, either because they rely on metric assumptions or because their indexing strategies are not well suited to distributed environments or to settings with constrained memory resources. This work introduces PDASC (Parametrizable Distributed Approximate Similarity Search with Clustering), a distributed ANN search algorithm whose index design simultaneously supports arbitrary dissimilarity functions and efficient deployment in distributed, storage-aware environments. PDASC builds a distributed hierarchical index based on clustering mechanisms that are agnostic to distance properties, thereby accommodating non-metric and domain-specific similarities while naturally partitioning indexing and search across multiple computing nodes, with a comp
Authors
(none)
Tags
Stats
Related papers
- PECANN: Parallel Efficient Clustering With Graph-based Approximate Nearest Neighbor Search (2023)0.00
- SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search (2021)0.00
- Parlayann: Scalable And Deterministic Parallel Graph-based Approximate Nearest Neighbor Search Algorithms (2023)10.35
- Approximate Nearest Neighbour Search On Dynamic Datasets: An Investigation (2024)0.00
- Subspace Collision: An Efficient And Accurate Framework For High-dimensional Approximate Nearest Neighbor Search (2024)7.16
- Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph (2017)18.20
- Lorann: Low-rank Matrix Factorization For Approximate Nearest Neighbor Search (2024)2.26
- Ood-diskann: Efficient And Scalable Graph ANNS For Out-of-distribution Queries (2022)0.00