Embedding-based Zero-shot Retrieval Through Query Generation
2020 Β· Davis Liang, Peng Xu, Siamak Shakeri, et al.
Abstract
Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this work, we consider the embedding-based two-tower architecture as our neural retrieval model. Since labeled data can be scarce and because neural retrieval models require vast amounts of data to train, we propose a novel method for generating synthetic training data for retrieval. Our system produces remarkable results, significantly outperforming BM25 on 5 out of 6 datasets tested, by an average of 2.45 points for Recall@1. In some cases, our model trained on synthetic data can even outperform the same model trained on real data
Authors
(none)
Tags
Stats
Related papers
- Improving Passage Retrieval With Zero-shot Question Generation (2022)12.87
- Out-of-domain Semantics To The Rescue! Zero-shot Hybrid Retrieval Models (2022)10.07
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00
- Don't Retrieve, Generate: Prompting Llms For Synthetic Training Data In Dense Retrieval (2025)0.00
- Precise Zero-shot Dense Retrieval Without Relevance Labels (2022)17.27
- Generative Retrieval As Dense Retrieval (2023)0.00
- Augmenting Passage Representations With Query Generation For Enhanced Cross-lingual Dense Retrieval (2023)8.14
- A Representation Sharpening Framework For Zero Shot Dense Retrieval (2025)0.00