ASI++: Towards Distributionally Balanced End-to-end Generative Retrieval
2024 Β· Yuxuan Liu, Tianchi Yang, Zihan Zhang, et al.
Abstract
Generative retrieval, a promising new paradigm in information retrieval, employs a seq2seq model to encode document features into parameters and decode relevant document identifiers (IDs) based on search queries. Existing generative retrieval solutions typically rely on a preprocessing stage to pre-define document IDs, which can suffer from a semantic gap between these IDs and the retrieval task. However, end-to-end training for both ID assignments and retrieval tasks is challenging due to the long-tailed distribution characteristics of real-world data, resulting in inefficient and unbalanced ID space utilization. To address these issues, we propose ASI++, a novel fully end-to-end generative retrieval method that aims to simultaneously learn balanced ID assignments and improve retrieval performance. ASI++ builds on the fully end-to-end training framework of vanilla ASI and introduces several key innovations. First, a distributionally balanced criterion addresses the imbalance in ID ass
Authors
(none)
Tags
Stats
Related papers
- Generative Retrieval As Multi-vector Dense Retrieval (2024)8.60
- Generative Retrieval Meets Multi-graded Relevance (2024)2.26
- Differentiable Geometric Indexing For End-to-end Generative Retrieval (2026)0.00
- Generative Retrieval As Dense Retrieval (2023)0.00
- Scalable And Effective Generative Information Retrieval (2023)10.48
- Planning Ahead In Generative Retrieval: Guiding Autoregressive Generation Through Simultaneous Decoding (2024)8.82
- Breaking The Hourglass Phenomenon Of Residual Quantization: Enhancing The Upper Bound Of Generative Retrieval (2024)4.52
- Learning To Tokenize For Generative Retrieval (2023)4.52