Real-time Indexing For Large-scale Recommendation By Streaming Vector Quantization Retriever
2025 Β· Xingyan Bin, Jianfei Cui, Wujie Yan, et al.
Abstract
Retrievers, which form one of the most important recommendation stages, are responsible for efficiently selecting possible positive samples to the later stages under strict latency limitations. Because of this, large-scale systems always rely on approximate calculations and indexes to roughly shrink candidate scale, with a simple ranking model. Considering simple models lack the ability to produce precise predictions, most of the existing methods mainly focus on incorporating complicated ranking models. However, another fundamental problem of index effectiveness remains unresolved, which also bottlenecks complication. In this paper, we propose a novel index structure: streaming Vector Quantization model, as a new generation of retrieval paradigm. Streaming VQ attaches items with indexes in real time, granting it immediacy. Moreover, through meticulous verification of possible variants, it achieves additional benefits like index balancing and reparability, enabling it to support complic
Authors
(none)
Tags
Stats
Related papers
- Cost: Contrastive Quantization Based Semantic Tokenization For Generative Recommendation (2024)7.81
- Deep Retrieval: Learning A Retrievable Structure For Large-scale Recommendations (2020)0.00
- Distill-vq: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge From Dense Embeddings (2022)0.00
- Jointly Optimizing Query Encoder And Product Quantization To Improve Retrieval Performance (2021)12.74
- Grank: Towards Target-aware And Streamlined Industrial Retrieval With A Generate-rank Framework (2025)0.00
- Mixed-precision Embeddings For Large-scale Recommendation Models (2024)0.00
- Semantic Certainty Assessment In Vector Retrieval Systems: A Novel Framework For Embedding Quality Evaluation (2025)0.00
- Domain-adaptive And Scalable Dense Retrieval For Content-based Recommendation (2026)0.00