Learning Effective Representations For Retrieval Using Self-distillation With Adaptive Relevance Margins
2024 Β· Lukas Gienapp, Niklas Deckers, Martin Potthast, et al.
Abstract
Representation-based retrieval models, so-called bi-encoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art bi-encoders are trained using an expensive training regime involving knowledge distillation from a teacher model and batch-sampling. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained language modeling capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We investigate the capabilities of our proposed approach through extensive experiments, demonstrating that self-distillation can match the effectiveness of teacher distillation using only 13.5% of the data, while offering a speedup in training time between 3x and 15x compared to parametrized losses. All code and data is made openly available.
Authors
(none)
Tags
Stats
Related papers
- Query Encoder Distillation Via Embedding Alignment Is A Strong Baseline Method To Boost Dense Retriever Online Efficiency (2023)0.00
- Domain Adaptation For Dense Retrieval Through Self-supervision By Pseudo-relevance Labeling (2022)0.00
- Knowledge Distillation In Document Retrieval (2019)0.00
- Embeddistill: A Geometric Knowledge Distillation For Information Retrieval (2023)0.00
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00
- Bixse: Improving Dense Retrieval Via Probabilistic Graded Relevance Distillation (2025)0.00
- Learning More From Less: Towards Strengthening Weak Supervision For Ad-hoc Retrieval (2019)5.84
- Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval With Cross-encoder Listwise Distillation And Synthetic Data (2025)0.00