Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment Of Embeddings For Asymmetrical Dual Encoders
2023 Β· Daniel Campos, Alessandro Magnani, Chengxiang Zhai
Abstract
In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders. First, we investigate the impact of pre and post-training compression on the MSMARCO, Natural Questions, TriviaQA, SQUAD, and SCIFACT, finding that asymmetry in the dual encoders in dense retrieval can lead to improved inference efficiency. Knowing this, we introduce Kullback Leibler Alignment of Embeddings (KALE), an efficient and accurate method for increasing the inference efficiency of dense retrieval methods by pruning and aligning the query encoder after training. Specifically, KALE extends traditional Knowledge Distillation after bi-encoder training, allowing for effective query encoder compression without full retraining or index generation. Using KALE and asymmetric training, we can generate models which exceed the performance of DistilBERT despite having
Authors
(none)
Tags
Stats
Related papers
- Query Encoder Distillation Via Embedding Alignment Is A Strong Baseline Method To Boost Dense Retriever Online Efficiency (2023)0.00
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- Noise-robust Dense Retrieval Via Contrastive Alignment Post Training (2023)0.00
- Back To Basics: A Simple Recipe For Improving Out-of-domain Retrieval In Dense Encoders (2023)0.00
- CODER: An Efficient Framework For Improving Retrieval Through Contextual Document Embedding Reranking (2021)7.16
- Align Then Train: Efficient Retrieval Adapter Learning (2026)0.00
- Dimension Reduction For Efficient Dense Retrieval Via Conditional Autoencoder (2022)8.13
- Pre-training Vs. Fine-tuning: A Reproducibility Study On Dense Retrieval Knowledge Acquisition (2025)0.95