Unsupervised Dense Retrieval With Relevance-aware Contrastive Pre-training
2023 Β· Yibin Lei, Liang Ding, Yu Cao, et al.
Abstract
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios. Contrastive pre-training, which constructs pseudo-positive examples from unlabeled data, has shown great potential to solve this problem. However, the pseudo-positive examples crafted by data augmentations can be irrelevant. To this end, we propose relevance-aware contrastive learning. It takes the intermediate-trained model itself as an imperfect oracle to estimate the relevance of positive pairs and adaptively weighs the contrastive loss of different pairs according to the estimated relevance. Our method consistently improves the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks. Further exploration shows that our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner. Our code is publicly available at https://github.com/Yibin-Lei/ReContriever.
Authors
(none)
Tags
Stats
Code
Related papers
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00
- Unsupervised Dense Retrieval With Conterfactual Contrastive Learning (2024)0.00
- Evaluating Contrastive Models For Instance-based Image Retrieval (2021)5.24
- Pre-training Vs. Fine-tuning: A Reproducibility Study On Dense Retrieval Knowledge Acquisition (2025)0.95
- Unsupervised Dense Retrieval Training With Web Anchors (2023)3.81
- Approximate Nearest Neighbor Negative Contrastive Learning For Dense Text Retrieval (2020)0.00
- Laprador: Unsupervised Pretrained Dense Retriever For Zero-shot Text Retrieval (2022)8.82
- Enhancing The Ranking Context Of Dense Retrieval Methods Through Reciprocal Nearest Neighbors (2023)4.52