Dynamic Enumeration Of Similarity Joins
2021 Β· Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, et al.
Abstract
This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of \(n\) points \(A,B\) in \(\mathbb\{R\}^d\), a metric \(\phi(\cdot)\), and a distance threshold \(r > 0\), report all pairs of points \((a, b) \in A \times B\) with \(\phi(a,b) \le r\). Our goal is to store \(A,B\) into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from \(A\) or \(B\). We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for \(\ell_1, \ell_\infty\) metrics with \(log^\{O(1)\} n\) update time and delay. We show that such a data structure is not feasible for the \(ββ\) metric for \(d \ge 4\). For appro
Authors
(none)
Tags
Stats
Related papers
- Massively-parallel Similarity Join, Edge-isoperimetry, And Distance Correlations On The Hypercube (2016)2.26
- Sublinear Data Structures For Nearest Neighbor In Ultra High Dimensions (2025)0.00
- Improving Distributed Similarity Join In Metric Space With Error-bounded Sampling (2019)0.00
- Optimal Las Vegas Locality Sensitive Data Structures (2017)6.77
- Similarity Join And Similarity Self-join Size Estimation In A Streaming Environment (2018)6.34
- Xling: A Learned Filter Framework For Accelerating High-dimensional Approximate Similarity Join (2024)0.00
- Efficient Similarity Search In Dynamic Data Streams (2016)0.00
- Preference-driven Similarity Join (2017)2.26