How Does Generative Retrieval Scale To Millions Of Passages?
2023 Β· Ronak Pradeep, Kai Hui, Jai Gupta, et al.
Abstract
Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer. Although many different approaches have been proposed to improve the effectiveness of generative retrieval, they have only been evaluated on document corpora on the order of 100k in size. We conduct the first empirical study of generative retrieval techniques across various corpus scales, ultimately scaling up to the entire MS MARCO passage ranking task with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters. We uncover several findings about scaling generative retrieval to millions of passages; notably, the central importance of using synthetic queries as document representations during indexing, the ineffectiveness of existing proposed architecture modifications when accounting for c
Authors
(none)
Tags
Stats
Related papers
- Scalable And Effective Generative Information Retrieval (2023)10.48
- Generative Retrieval As Dense Retrieval (2023)0.00
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00
- Generative Retrieval As Multi-vector Dense Retrieval (2024)8.60
- Learning To Rank In Generative Retrieval (2023)11.91
- Generative Retrieval Meets Multi-graded Relevance (2024)2.26
- Generative Dense Retrieval: Memory Can Be A Burden (2024)4.52
- Evaluating Dense Passage Retrieval Using Transformers (2022)0.00