Dimension Reduction For Efficient Dense Retrieval Via Conditional Autoencoder
2022 Β· Zhenghao Liu, Han Zhang, Chenyan Xiong, et al.
Abstract
Dense retrievers encode queries and documents and map them in an embedding space using pre-trained language models. These embeddings need to be high-dimensional to fit training signals and guarantee the retrieval effectiveness of dense retrievers. However, these high-dimensional embeddings lead to larger index storage and higher retrieval latency. To reduce the embedding dimensions of dense retrieval, this paper proposes a Conditional Autoencoder (ConAE) to compress the high-dimensional embeddings to maintain the same embedding distribution and better recover the ranking features. Our experiments show that ConAE is effective in compressing embeddings by achieving comparable ranking performance with its teacher model and making the retrieval system more efficient. Our further analyses show that ConAE can alleviate the redundancy of the embeddings of dense retrieval with only one linear layer. All codes of this work are available at https://github.com/NEUIR/ConAE.
Authors
(none)
Tags
Stats
Code
- NEUIR/ConAEβ
Related papers
- Dimension Vs. Precision: A Comparative Analysis Of Autoencoders And Quantization For Efficient Vector Retrieval On BEIR Scifact (2025)0.00
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- Learning Discrete Representations Via Constrained Clustering For Effective And Efficient Dense Retrieval (2021)11.39
- Less Is More: Pre-train A Strong Text Encoder For Dense Retrieval Using A Weak Decoder (2021)14.29
- Query Encoder Distillation Via Embedding Alignment Is A Strong Baseline Method To Boost Dense Retriever Online Efficiency (2023)0.00
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16
- Unsupervised Dense Retrieval With Conterfactual Contrastive Learning (2024)0.00
- Interpret And Control Dense Retrieval With Sparse Latent Features (2024)2.26