Learning Compressed Sentence Representations For On-device Text Processing
2019 Β· Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, et al.
Abstract
Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a binarized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their
Authors
(none)
Tags
Stats
Related papers
- Learning Compressed Embeddings For On-device Inference (2022)0.00
- Compressing Sentence Representation For Semantic Retrieval Via Homomorphic Projective Distillation (2022)2.26
- Vector Representations Of Text Data In Deep Learning (2019)0.00
- Experimental Analysis Of Large-scale Learnable Vector Storage Compression (2023)7.50
- Ultra-high Dimensional Sparse Representations With Binarization For Efficient Text Retrieval (2021)8.60
- Leaner And Faster: Two-stage Model Compression For Lightweight Text-image Retrieval (2022)6.34
- Contextual Lensing Of Universal Sentence Representations (2020)0.00
- Massively Multilingual Sentence Embeddings For Zero-shot Cross-lingual Transfer And Beyond (2018)26.33