Differentially Private One Permutation Hashing And Bin-wise Consistent Weighted Sampling
2023 Β· Xiaoyun Li, Ping Li
Abstract
Minwise hashing (MinHash) is a standard algorithm widely used in the industry, for large-scale search and learning applications with the binary (0/1) Jaccard similarity. One common use of MinHash is for processing massive n-gram text representations so that practitioners do not have to materialize the original data (which would be prohibitive). Another popular use of MinHash is for building hash tables to enable sub-linear time approximate near neighbor (ANN) search. MinHash has also been used as a tool for building large-scale machine learning systems. The standard implementation of MinHash requires applying \(K\) random permutations. In comparison, the method of one permutation hashing (OPH), is an efficient alternative of MinHash which splits the data vectors into \(K\) bins and generates hash values within each bin. OPH is substantially more efficient and also more convenient to use. In this paper, we combine the differential privacy (DP) with OPH (as well as MinHash), to propose
Authors
(none)
Tags
Stats
Related papers
- Maximally Consistent Sampling And The Jaccard Index Of Probability Distributions (2018)0.00
- Hierarchical One Permutation Hashing: Efficient Multimedia Near Duplicate Detection (2018)4.52
- Probminhash -- A Class Of Locality-sensitive Hash Algorithms For The (probability) Jaccard Similarity (2019)9.92
- Practical Hash Functions For Similarity Estimation And Dimensionality Reduction (2017)0.00
- Superminhash - A New Minwise Hashing Algorithm For Jaccard Similarity Estimation (2017)0.00
- Deep Priority Hashing (2018)11.67
- Shuffle And Learn: Minimizing Mutual Information For Unsupervised Hashing (2020)0.00
- Robust Hashing For Multi-view Data: Jointly Learning Low-rank Kernelized Similarity Consensus And Hash Functions (2016)11.19