2-bit Random Projections, Nonlinear Estimators, And Approximate Near Neighbor Search
2016 Β· Ping Li, Michael Mitzenmacher, Anshumali Shrivastava
Abstract
The method of random projections has become a standard tool for machine learning, data mining, and search with massive data at Web scale. The effective use of random projections requires efficient coding schemes for quantizing (real-valued) projected data into integers. In this paper, we focus on a simple 2-bit coding scheme. In particular, we develop accurate nonlinear estimators of data similarity based on the 2-bit strategy. This work will have important practical applications. For example, in the task of near neighbor search, a crucial step (often called re-ranking) is to compute or estimate data similarities once a set of candidate data points have been identified by hash table techniques. This re-ranking step can take advantage of the proposed coding scheme and estimator. As a related task, in this paper, we also study a simple uniform quantization scheme for the purpose of building hash tables with projected data. Our analysis shows that typically only a small number of bits a
Authors
(none)
Tags
Stats
Related papers
- Quantization Meets Projection: A Happy Marriage For Approximate K-nearest Neighbor Search (2024)0.00
- Quantization Algorithms For Random Fourier Features (2021)0.00
- Nearest Neighbor Search With Compact Codes: A Decoder Perspective (2021)3.58
- Beyond Neighbourhood-preserving Transformations For Quantization-based Unsupervised Hashing (2021)4.52
- Polysemous Codes (2016)11.49
- Improving Locality Sensitive Hashing By Efficiently Finding Projected Nearest Neighbors (2020)6.77
- Analysis Of Sparsehash: An Efficient Embedding Of Set-similarity Via Sparse Projections (2019)4.52
- Rabitq: Quantizing High-dimensional Vectors With A Theoretical Error Bound For Approximate Nearest Neighbor Search (2024)12.54