CAFE: Towards Compact, Adaptive, And Fast Embedding For Large-scale Recommendation Models
2023 Β· Hailin Zhang, Zirui Liu, Boxuan Chen, et al.
Abstract
Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we fu
Authors
(none)
Tags
Stats
Related papers
- Mem-rec: Memory Efficient Recommendation System Using Alternative Representation (2023)0.00
- Semantically Constrained Memory Allocation (SCMA) For Embedding In Efficient Recommendation Systems (2021)0.00
- Fine-grained Embedding Dimension Optimization During Training For Recommender Systems (2024)0.00
- Mixed-precision Embeddings For Large-scale Recommendation Models (2024)0.00
- Learning Compressed Embeddings For On-device Inference (2022)0.00
- Learning Compact Compositional Embeddings Via Regularized Pruning For Recommendation (2023)8.36
- Autoemb: Automated Embedding Dimensionality Search In Streaming Recommendations (2020)12.61
- Efficient Learning Of Sparse Representations From Interactions (2026)1.57