Pigeonring: A Principle For Faster Thresholded Similarity Search
2018 · Jianbin Qin, Chuan Xiao
Abstract
The pigeonhole principle states that if \(n\) items are contained in \(m\) boxes, then at least one box has no more than \(n / m\) items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principle-based solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we propose a new principle, called the pigeonring principle, which constrains the number of items in multiple boxes and yields stronger conditions. To utilize the new principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the new principle, stronger filtering conditions can be established. We show that the pige
Authors
(none)
Tags
Stats
Related papers
- Index-based, High-dimensional, Cosine Threshold Querying With Optimality Guarantees (2018)6.34
- Preference-driven Similarity Join (2017)2.26
- Set Similarity Search Beyond Minhash (2016)10.74
- Sampling A Near Neighbor In High Dimensions -- Who Is The Fairest Of Them All? (2021)5.84
- Fair Near Neighbor Search: Independent Range Sampling In High Dimensions (2019)8.82
- Subsets And Supermajorities: Optimal Hashing-based Set Similarity Search (2019)5.84
- Adaptive Prefiltering For High-dimensional Similarity Search: A Frequency-aware Approach (2025)0.00
- Bitmap Filter: Speeding Up Exact Set Similarity Joins With Bitwise Operations (2017)3.58