Synthetic Target Domain Supervision For Open Retrieval QA
2022 Β· Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, et al.
Abstract
Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR) -- a state-of-the-art (SOTA) open domain neural retrieval model -- on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets.
Authors
(none)
Tags
Stats
Related papers
- Learning To Retrieve Passages Without Supervision (2021)8.09
- Efficient Passage Retrieval With Hashing For Open-domain Question Answering (2021)15.77
- Dense Passage Retrieval: Is It Retrieving? (2024)6.34
- DAPR: A Benchmark On Document-aware Passage Retrieval (2023)5.18
- Embedding-based Zero-shot Retrieval Through Query Generation (2020)0.00
- Improving Dense Passage Retrieval With Multiple Positive Passages (2025)0.00
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00
- Towards Universal Dense Retrieval For Open-domain Question Answering (2021)0.00