Corpusbrain: Pre-train A Generative Retrieval Model For Knowledge-intensive Language Tasks
2022 Β· Jiangui Chen, Ruqing Zhang, Jiafeng Guo, et al.
Abstract
Knowledge-intensive language tasks (KILT) usually require a large body of information to provide correct answers. A popular paradigm to solve this problem is to combine a search system with a machine reader, where the former retrieves supporting evidences and the latter examines them to produce answers. Recently, the reader component has witnessed significant advances with the help of large-scale pre-trained generative models. Meanwhile most existing solutions in the search component rely on the traditional ``index-retrieve-then-rank'' pipeline, which suffers from large memory footprint and difficulty in end-to-end optimization. Inspired by recent efforts in constructing model-based IR models, we propose to replace the traditional multi-step search pipeline with a novel single-step generative model, which can dramatically simplify the search process and be optimized in an end-to-end manner. We show that a strong generative retrieval model can be learned with a set of adequately designe
Authors
(none)
Tags
Stats
Related papers
- REVEAL: Retrieval-augmented Visual-language Pre-training With Multi-source Multimodal Knowledge Memory (2022)13.65
- Pre-training Multi-modal Dense Retrievers For Outside-knowledge Visual Question Answering (2023)7.50
- Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022)14.66
- Graph-based Retriever Captures The Long Tail Of Biomedical Knowledge (2024)0.00
- Generative Cross-modal Retrieval: Memorizing Images In Multimodal Language Models For Retrieval And Beyond (2024)8.35
- Pre-training Vs. Fine-tuning: A Reproducibility Study On Dense Retrieval Knowledge Acquisition (2025)0.95
- Hetarag: Hybrid Deep Retrieval-augmented Generation Across Heterogeneous Data Stores (2025)3.27
- Pre-training Tasks For Embedding-based Large-scale Retrieval (2020)0.00