Domain Adaptation For Dense Retrieval Through Self-supervision By Pseudo-relevance Labeling
2022 Β· Minghan Li, Eric Gaussier
Abstract
Although neural information retrieval has witnessed great improvements, recent works showed that the generalization ability of dense retrieval models on target domains with different distributions is limited, which contrasts with the results obtained with interaction-based models. To address this issue, researchers have resorted to adversarial learning and query generation approaches; both approaches nevertheless resulted in limited improvements. In this paper, we propose to use a self-supervision approach in which pseudo-relevance labels are automatically generated on the target domain. To do so, we first use the standard BM25 model on the target domain to obtain a first ranking of documents, and then use the interaction-based model T53B to re-rank top documents. We further combine this approach with knowledge distillation relying on an interaction-based teacher model trained on the source domain. Our experiments reveal that pseudo-relevance labeling using T53B and the MiniLM teacher
Authors
(none)
Tags
Stats
Related papers
- Domain Adaptation For Dense Retrieval And Conversational Dense Retrieval Through Self-supervision By Meticulous Pseudo-relevance Labeling (2024)0.00
- Learning More From Less: Towards Strengthening Weak Supervision For Ad-hoc Retrieval (2019)5.84
- Learning Effective Representations For Retrieval Using Self-distillation With Adaptive Relevance Margins (2024)2.26
- GPL: Generative Pseudo Labeling For Unsupervised Domain Adaptation Of Dense Retrieval (2021)17.47
- Teaching Dense Retrieval Models To Specialize With Listwise Distillation And LLM Data Augmentation (2025)0.00
- Disentangled Modeling Of Domain And Relevance For Adaptable Dense Retrieval (2022)0.00
- Towards Consistency Filtering-free Unsupervised Learning For Dense Retrieval (2023)0.00
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00