Breaking The Hourglass Phenomenon Of Residual Quantization: Enhancing The Upper Bound Of Generative Retrieval
2024 Β· Zhirui Kuai, Zuxu Chen, Huimu Wang, et al.
Abstract
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER employing Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critical issue termed the "\textbf\{Hourglass\}" phenomenon, occurs in RQ-SID, where intermediate codebook tokens become overly concentrated, hindering the full utilization of generative retrieval methods. This paper analyses and addresses this problem by identifying data sparsity and long-tailed distribution as the primary causes. Through comprehensive experiments and detailed ablation studies, we analyze the impact of these factors on codebook utilization and data distribution. Our findings reveal that the "Hourglass" phenomenon substantially impacts the performance of RQ-SID in generative retrieva
Authors
(none)
Tags
Stats
Related papers
- Differentiable Geometric Indexing For End-to-end Generative Retrieval (2026)0.00
- Cost: Contrastive Quantization Based Semantic Tokenization For Generative Recommendation (2024)7.81
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00
- Generative Retrieval Meets Multi-graded Relevance (2024)2.26
- Generative Retrieval As Dense Retrieval (2023)0.00
- ASI++: Towards Distributionally Balanced End-to-end Generative Retrieval (2024)0.00
- Continual Learning For Generative Retrieval Over Dynamic Corpora (2023)11.49
- Scalable And Effective Generative Information Retrieval (2023)10.48