Laprador: Unsupervised Pretrained Dense Retriever For Zero-shot Text Retrieval
2022 Β· Canwen Xu, Daya Guo, Nan Duan, et al.
Abstract
In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving sup
Authors
(none)
Tags
Stats
Related papers
- Injecting Domain Adaptation With Learning-to-hash For Effective And Efficient Zero-shot Dense Retrieval (2022)2.80
- Promptreps: Prompting Large Language Models To Generate Dense And Sparse Representations For Zero-shot Document Retrieval (2024)10.61
- Unsupervised Dense Retrieval With Relevance-aware Contrastive Pre-training (2023)10.44
- A Representation Sharpening Framework For Zero Shot Dense Retrieval (2025)0.00
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Dense Text Retrieval Based On Pretrained Language Models: A Survey (2022)15.95
- Pre-training Vs. Fine-tuning: A Reproducibility Study On Dense Retrieval Knowledge Acquisition (2025)0.95
- Lexlip: Lexicon-bottlenecked Language-image Pre-training For Large-scale Image-text Retrieval (2023)10.85