Better Generalization With Semantic Ids: A Case Study In Ranking For Recommendations
2023 Β· Anima Singh, Trung Vu, Nikhil Mehta, et al.
Abstract
Randomly-hashed item ids are used ubiquitously in recommendation models. However, the learned representations from random hashing prevents generalization across similar items, causing problems of learning unseen and long-tail items, especially when item corpus is large, power-law distributed, and evolving dynamically. In this paper, we propose using content-derived features as a replacement for random ids. We show that simply replacing ID features with content-based embeddings can cause a drop in quality due to reduced memorization capability. To strike a good balance of memorization and generalization, we propose to use Semantic IDs -- a compact discrete item representation learned from frozen content embeddings using RQ-VAE that captures the hierarchy of concepts in items -- as a replacement for random item ids. Similar to content embeddings, the compactness of Semantic IDs poses a problem of easy adaption in recommendation models. We propose novel methods for adapting Semantic IDs i
Authors
(none)
Tags
Stats
Related papers
- Unified Semantic And ID Representation Learning For Deep Recommenders (2025)0.00
- Learning To Collide: Recommendation System Model Compression With Learned Hash Functions (2022)0.00
- Cost: Contrastive Quantization Based Semantic Tokenization For Generative Recommendation (2024)7.81
- Learning Compact Compositional Embeddings Via Regularized Pruning For Recommendation (2023)8.36
- Representation Learning For Efficient And Effective Similarity Search And Recommendation (2021)0.00
- Mixed-precision Embeddings For Large-scale Recommendation Models (2024)0.00
- Domain-adaptive And Scalable Dense Retrieval For Content-based Recommendation (2026)0.00
- Generative Retrieval With Semantic Tree-structured Item Identifiers Via Contrastive Learning (2023)4.52