Hierarchical Corpus Encoder: Fusing Generative Retrieval And Dense Indices
2025 Β· Tongfei Chen, Ankita Sharma, Adam Pauls, et al.
Abstract
Generative retrieval employs sequence models for conditional generation of document IDs based on a query (DSI (Tay et al., 2022); NCI (Wang et al., 2022); inter alia). While this has led to improved performance in zero-shot retrieval, it is a challenge to support documents not seen during training. We identify the performance of generative retrieval lies in contrastive training between sibling nodes in a document hierarchy. This motivates our proposal, the hierarchical corpus encoder (HCE), which can be supported by traditional dense encoders. Our experiments show that HCE achieves superior results than generative retrieval models under both unsupervised zero-shot and supervised settings, while also allowing the easy addition and removal of documents to the index.
Authors
(none)
Tags
Stats
Related papers
- Generative Retrieval As Dense Retrieval (2023)0.00
- EHI: End-to-end Learning Of Hierarchical Index For Efficient Dense Retrieval (2023)0.00
- Generative Retrieval As Multi-vector Dense Retrieval (2024)8.60
- Continual Learning For Generative Retrieval Over Dynamic Corpora (2023)11.49
- Generative Dense Retrieval: Memory Can Be A Burden (2024)4.52
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00
- Precise Zero-shot Dense Retrieval Without Relevance Labels (2022)17.27
- Scalable And Effective Generative Information Retrieval (2023)10.48