Questions Are All You Need To Train A Dense Passage Retriever
2022 Β· Devendra Singh Sachan, Mike Lewis, Dani Yogatama, et al.
Abstract
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires access to unpaired inputs and outputs (e.g. questions and potential answer documents). It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question. Training for retrieval based on question reconstruction enables effective unsupervised learning of both document and question encoders, which can be later incorporated into complete Open QA systems without any further finetuning. Extensive experiments demonstrate that ART obtain
Authors
(none)
Tags
Stats
Related papers
- Towards Universal Dense Retrieval For Open-domain Question Answering (2021)0.00
- Learning To Retrieve Passages Without Supervision (2021)8.09
- Pre-training Multi-modal Dense Retrievers For Outside-knowledge Visual Question Answering (2023)7.50
- Noise-robust Dense Retrieval Via Contrastive Alignment Post Training (2023)0.00
- Less Is More: Pre-train A Strong Text Encoder For Dense Retrieval Using A Weak Decoder (2021)14.29
- QAEA-DR: A Unified Text Augmentation Framework For Dense Retrieval (2024)5.24
- Efficient Passage Retrieval With Hashing For Open-domain Question Answering (2021)15.77
- Dense Passage Retrieval In Conversational Search (2025)0.00