Learning Discrete Representations Via Constrained Clustering For Effective And Efficient Dense Retrieval
2021 Β· Jingtao Zhan, Jiaxin Mao, Yiqun Liu, et al.
Abstract
Dense Retrieval (DR) has achieved state-of-the-art first-stage ranking effectiveness. However, the efficiency of most existing DR models is limited by the large memory cost of storing dense vectors and the time-consuming nearest neighbor search (NNS) in vector space. Therefore, we present RepCONC, a novel retrieval model that learns discrete Representations via CONstrained Clustering. RepCONC jointly trains dual-encoders and the Product Quantization (PQ) method to learn discrete document representations and enables fast approximate NNS with compact indexes. It models quantization as a constrained clustering process, which requires the document embeddings to be uniformly clustered around the quantization centroids and supports end-to-end optimization of the quantization method and dual-encoders. We theoretically demonstrate the importance of the uniform clustering constraint in RepCONC and derive an efficient approximate solution for constrained clustering by reducing it to an instance
Authors
(none)
Tags
Stats
Related papers
- Jointly Optimizing Query Encoder And Product Quantization To Improve Retrieval Performance (2021)12.74
- Improving Document Representations By Generating Pseudo Query Embeddings For Dense Retrieval (2021)9.41
- Few-shot Conversational Dense Retrieval (2021)16.68
- Disentangled Modeling Of Domain And Relevance For Adaptable Dense Retrieval (2022)0.00
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Dimension Reduction For Efficient Dense Retrieval Via Conditional Autoencoder (2022)8.13
- Curriculum Learning For Dense Retrieval Distillation (2022)11.49
- Constructing Tree-based Index For Efficient And Effective Dense Retrieval (2023)9.23