GB-KMV: An Augmented KMV Sketch For Approximate Containment Similarity Search
2018 Β· Yang Yang, Ying Zhang, Wenjie Zhang, et al.
Abstract
In this paper, we study the problem of approximate containment similarity search. Given two records Q and X, the containment similarity between Q and X with respect to Q is |Q intersect X|/ |Q|. Given a query record Q and a set of records S, the containment similarity search finds a set of records from S whose containment similarity regarding Q are not less than the given threshold. This problem has many important applications in commercial and scientific fields such as record matching and domain search. Existing solution relies on the asymmetric LSH method by transforming the containment similarity to well-studied Jaccard similarity. In this paper, we use a different framework by transforming the containment similarity to set intersection. We propose a novel augmented KMV sketch technique, namely GB-KMV, which is data-dependent and can achieve a good trade-off between the sketch size and the accuracy. We provide a set of theoretical analysis to underpin the proposed augmented KMV sket
Authors
(none)
Tags
Stats
Related papers
- Unconventional Application Of K-means For Distributed Approximate Similarity Search (2022)5.84
- Quantization Meets Projection: A Happy Marriage For Approximate K-nearest Neighbor Search (2024)0.00
- Mmlsh: A Practical And Efficient Technique For Processing Approximate Nearest Neighbor Queries On Multimedia Data (2020)4.52
- Set Similarity Search Beyond Minhash (2016)10.74
- Interleaved Composite Quantization For High-dimensional Similarity Search (2019)0.00
- Kernel Density Estimation Through Density Constrained Near Neighbor Search (2020)5.84
- Subsets And Supermajorities: Optimal Hashing-based Set Similarity Search (2019)5.84
- Indexing Metric Spaces For Exact Similarity Search (2020)10.85