Group Testing For Accurate And Efficient Range-based Near Neighbor Search For Plagiarism Detection
2023 Β· Harsh Shah, Kashish Mittal, Ajit Rajwade
Abstract
This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem. Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search. Like other methods for large scale retrieval, our approach exploits the assumption that most of the items in the database are unrelated to the query. However, it does not assume a large difference between the cosine similarity of the query vector with the least related neighbor and that with the least unrelated non-neighbor. Following a multi-stage adaptive group testing algorithm based on binary splitting, we divide the set of items to be searched into half at each step, and perform dot product tests on smaller and smaller subsets, many of which we are able to prune away. We show that, using softmax-based features, our method achieves a more than ten-fold speed-up over exhaustive search with no loss of ac
Authors
(none)
Tags
Stats
Related papers
- Experimental Analysis Of Locality Sensitive Hashing Techniques For High-dimensional Approximate Nearest Neighbor Searches (2020)6.34
- A Fast Text Similarity Measure For Large Document Collections Using Multi-reference Cosine And Genetic Algorithm (2018)4.52
- A Scalable Solution To The Nearest Neighbor Search Problem Through Local-search Methods On Neighbor Graphs (2017)3.58
- Adaptive Prefiltering For High-dimensional Similarity Search: A Frequency-aware Approach (2025)0.00
- Fast Cosine Similarity Search In Binary Space With Angular Multi-index Hashing (2016)8.60
- High-dimensional Approximate Nearest Neighbor Search: With Reliable And Efficient Distance Comparison Operations (2023)13.44
- PECANN: Parallel Efficient Clustering With Graph-based Approximate Nearest Neighbor Search (2023)0.00
- Learning Nearest Neighbor Graphs From Noisy Distance Samples (2019)0.00