Learning Compressed Embeddings For On-device Inference
2022 Β· Niketan Pansare, Jay Katukuri, Aditya Arora, et al.
Abstract
In deep learning, embeddings are widely used to represent categorical entities such as words, apps, and movies. An embedding layer maps each entity to a unique vector, causing the layer's memory requirement to be proportional to the number of entities. In the recommendation domain, a given category can have hundreds of thousands of entities, and its embedding layer can take gigabytes of memory. The scale of these networks makes them difficult to deploy in resource constrained environments. In this paper, we propose a novel approach for reducing the size of an embedding table while still mapping each entity to its own unique embedding. Rather than maintaining the full embedding table, we construct each entity's embedding "on the fly" using two separate embedding tables. The first table employs hashing to force multiple entities to share an embedding. The second table contains one trainable weight per entity, allowing the model to distinguish between entities sharing the same embedding.
Authors
(none)
Tags
Stats
Related papers
- CAFE: Towards Compact, Adaptive, And Fast Embedding For Large-scale Recommendation Models (2023)8.09
- Mixed-precision Embeddings For Large-scale Recommendation Models (2024)0.00
- Mem-rec: Memory Efficient Recommendation System Using Alternative Representation (2023)0.00
- Learning To Collide: Recommendation System Model Compression With Learned Hash Functions (2022)0.00
- Learning Compressed Sentence Representations For On-device Text Processing (2019)5.84
- Fine-grained Embedding Dimension Optimization During Training For Recommender Systems (2024)0.00
- Experimental Analysis Of Large-scale Learnable Vector Storage Compression (2023)7.50
- Semantically Constrained Memory Allocation (SCMA) For Embedding In Efficient Recommendation Systems (2021)0.00