Is Smaller Always Faster? Tradeoffs In Compressing Self-supervised Speech Transformers
2022 Β· Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, et al.
Abstract
Transformer-based self-supervised models have achieved remarkable success in speech processing, but their large size and high inference cost present significant challenges for real-world deployment. While numerous compression techniques have been proposed, inconsistent evaluation metrics make it difficult to compare their practical effectiveness. In this work, we conduct a comprehensive study of four common compression methods, including weight pruning, head pruning, low-rank approximation, and knowledge distillation on self-supervised speech Transformers. We evaluate each method under three key metrics: parameter count, multiply-accumulate operations, and real-time factor. Results show that each method offers distinct advantages. In addition, we contextualize recent compression techniques, comparing DistilHuBERT, FitHuBERT, LightHuBERT, ARMHuBERT, and STaRHuBERT under the same framework, offering practical guidance on compression for deployment.
Authors
(none)
Tags
Stats
Related papers
- Lighthubert: Lightweight And Configurable Speech Representation Learning With Once-for-all Hidden-unit BERT (2022)15.51
- Recycle-and-distill: Universal Compression Strategy For Transformer-based Speech SSL Models With Attention Map Reusing And Masking Distillation (2023)5.84
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- When To Use Efficient Self Attention? Profiling Text, Speech And Image Transformer Variants (2023)0.95
- Structured Pruning Of Self-supervised Pre-trained Models For Speech Recognition And Understanding (2023)11.39
- Star: Distilling Speech Temporal Relation For Lightweight Speech Self-supervised Learning Models (2023)5.24
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- Input-independent Attention Weights Are Expressive Enough: A Study Of Attention In Self-supervised Audio Transformers (2020)0.00