Xling: A Learned Filter Framework For Accelerating High-dimensional Approximate Similarity Join
2024 Β· Yifan Wang, Vyom Pathak, Daisy Zhe Wang
Abstract
Similarity join finds all pairs of close points within a given distance threshold. Many similarity join methods have been proposed, but they are usually not efficient on high-dimensional space due to the curse of dimensionality and data-unawareness. We investigate the possibility of using metric space Bloom filter (MSBF), a family of data structures checking if a query point has neighbors in a multi-dimensional space, to speed up similarity join. However, there are several challenges when applying MSBF to similarity join, including excessive information loss, data-unawareness and hard constraint on the distance metric. In this paper, we propose Xling, a generic framework to build a learning-based metric space filter with any existing regression model, aiming at accurately predicting whether a query point has enough number of neighbors. The framework provides a suite of optimization strategies to further improve the prediction quality based on the learning model, which has demonstrated
Authors
(none)
Tags
Stats
Related papers
- Lsf-join: Locality Sensitive Filtering For Distributed All-pairs Set Similarity Under Skew (2020)6.34
- Bitmap Filter: Speeding Up Exact Set Similarity Joins With Bitwise Operations (2017)3.58
- A Framework For Similarity Search With Space-time Tradeoffs Using Locality-sensitive Filtering (2016)8.35
- Dynamic Enumeration Of Similarity Joins (2021)0.00
- Adaptive Prefiltering For High-dimensional Similarity Search: A Frequency-aware Approach (2025)0.00
- Improving Distributed Similarity Join In Metric Space With Error-bounded Sampling (2019)0.00
- Preference-driven Similarity Join (2017)2.26
- Billion-scale Similarity Search Using A Hybrid Indexing Approach With Advanced Filtering (2025)4.52