GPL: Generative Pseudo Labeling For Unsupervised Domain Adaptation Of Dense Retrieval
2021 Β· Kexin Wang, Nandan Thakur, Nils Reimers, et al.
Abstract
Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrieval approaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation method Generative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of
Authors
(none)
Tags
Stats
Related papers
- Domain Adaptation For Dense Retrieval And Conversational Dense Retrieval Through Self-supervision By Meticulous Pseudo-relevance Labeling (2024)0.00
- Domain Adaptation For Dense Retrieval Through Self-supervision By Pseudo-relevance Labeling (2022)0.00
- How To Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval (2023)11.39
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00
- Injecting Domain Adaptation With Learning-to-hash For Effective And Efficient Zero-shot Dense Retrieval (2022)2.80
- Generative Retrieval As Dense Retrieval (2023)0.00
- RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization For Generative Retrieval In E-commerce (2026)0.00
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25