Unite: Uncertainty-based Iterative Document Sampling For Domain Adaptation In Information Retrieval
2026 Β· Jongyoon Kim, Minseong Hwang, Seung-Won Hwang
Abstract
arXiv:2604.25142v1 Announce Type: new Abstract: Unsupervised domain adaptation generalizes neural retrievers to an unseen domain by generating pseudo queries on target domain documents. The quality and efficiency of this adaptation critically depend on which documents are selected for pseudo query generation. The existing document sampling method focuses on diversity but fails to capture model uncertainty. In contrast, we propose **Un**certainty-based **Ite**rative Document Sampling (UnIte) addressing these limitations by (1) filtering documents with high aleatoric uncertainty and (2) prioritizing those with high epistemic uncertainty, maximizing the learning utility of the current model. We conducted extensive experiments on a large corpus of BEIR with small and large models, showing significant gains of +2.45 and +3.49 nDCG@10 with a smaller training sample size, 4k on average.
Authors
(none)
Tags
Stats
Related papers
- Influence Guided Sampling For Domain Adaptation Of Text Retrievers (2026)0.00
- Adversarial Sampling And Training For Semi-supervised Information Retrieval (2018)14.43
- Domain Adaptation For Dense Retrieval Through Self-supervision By Pseudo-relevance Labeling (2022)0.00
- Customir: Unsupervised Fine-tuning Of Dense Embeddings For Known Document Corpora (2025)0.00
- Unsupervised Data Uncertainty Learning In Visual Retrieval Systems (2019)0.00
- Incdsi: Incrementally Updatable Document Retrieval (2023)2.16
- Uniir: Training And Benchmarking Universal Multimodal Information Retrievers (2023)10.48
- IRGAN: A Minimax Game For Unifying Generative And Discriminative Information Retrieval Models (2017)20.19