CAT-ID\(^2\): Category-tree Integrated Document Identifier Learning For Generative Retrieval In E-commerce
2025 Β· Xiaoyu Liu, Fuwei Zhang, Yiqing Wu, et al.
Abstract
Generative retrieval (GR) has gained significant attention as an effective paradigm that integrates the capabilities of large language models (LLMs). It generally consists of two stages: constructing discrete semantic identifiers (IDs) for documents and retrieving documents by autoregressively generating ID tokens. The core challenge in GR is how to construct document IDs (DocIDS) with strong representational power. Good IDs should exhibit two key properties: similar documents should have more similar IDs, and each document should maintain a distinct and unique ID. However, most existing methods ignore native category information, which is common and critical in E-commerce. Therefore, we propose a novel ID learning method, CAtegory-Tree Integrated Document IDentifier (CAT-ID\(^2\)), incorporating prior category information into the semantic IDs. CAT-ID\(^2\) includes three key modules: a Hierarchical Class Constraint Loss to integrate category information layer by layer during quantiza
Authors
(none)
Tags
Stats
Related papers
- Learning To Tokenize For Generative Retrieval (2023)4.52
- Generative Retrieval Meets Multi-graded Relevance (2024)2.26
- Continual Learning For Generative Retrieval Over Dynamic Corpora (2023)11.49
- Generative Retrieval With Semantic Tree-structured Item Identifiers Via Contrastive Learning (2023)4.52
- GLEN: Generative Retrieval Via Lexical Index Learning (2023)9.29
- Generative Retrieval As Multi-vector Dense Retrieval (2024)8.60
- Extending CLIP For Category-to-image Retrieval In E-commerce (2021)8.60
- Lightweight And Direct Document Relevance Optimization For Generative Information Retrieval (2025)4.52