How To Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval
2023 Β· Sheng-Chieh Lin, Akari Asai, Minghan Li, et al.
Abstract
Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA). Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse a
Authors
(none)
Tags
Stats
Related papers
- Disentangled Modeling Of Domain And Relevance For Adaptable Dense Retrieval (2022)0.00
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Few-shot Conversational Dense Retrieval (2021)16.68
- Towards Dynamic Dense Retrieval With Routing Strategy (2026)0.00
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Interpreting Dense Retrieval As Mixture Of Topics (2021)0.00
- Soft Prompt Tuning For Augmenting Dense Retrieval With Large Language Models (2023)9.41
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00