Forestdsh: A Universal Hash Design For Discrete Probability Distributions
2019 Β· Arash Gholami Davoodi, Sean Chang, Hyun Gon Yoo, et al.
Abstract
In this paper, we consider the problem of classification of \(M\) high dimensional queries \(y^1,\cdots,y^M\in B^S\) to \(N\) high dimensional classes \(x^1,\cdots,x^N\in A^S\) where \(A\) and \(B\) are discrete alphabets and the probabilistic model that relates data to the classes \(P(x,y)\) is known. This problem has applications in various fields including the database search problem in mass spectrometry. The problem is analogous to the nearest neighbor search problem, where the goal is to find the data point in a database that is the most similar to a query point. The state of the art method for solving an approximate version of the nearest neighbor search problem in high dimensions is locality sensitive hashing (LSH). LSH is based on designing hash functions that map near points to the same buckets with a probability higher than random (far) points. To solve our high dimensional classification problem, we introduce distribution sensitive hashes that map jointly generated pairs \((
Authors
(none)
Tags
Stats
Related papers
- Distance-sensitive Hashing (2017)8.82
- Improving Similarity Search With High-dimensional Locality-sensitive Hashing (2018)0.00
- Experimental Analysis Of Locality Sensitive Hashing Techniques For High-dimensional Approximate Nearest Neighbor Searches (2020)6.34
- Locality-sensitive Hashing For F-divergences: Mutual Information Loss And Beyond (2019)0.00
- Practical Hash Functions For Similarity Estimation And Dimensionality Reduction (2017)0.00
- Supervised Discrete Hashing With Relaxation (2019)14.15
- Sharing Hash Codes For Multiple Purposes (2016)0.00
- Discriminative Supervised Hashing For Cross-modal Similarity Search (2018)7.81