Usm-lite: Quantization And Sparsity Aware Fine-tuning For Speech Recognition With Universal Speech Models
2023 Β· Shaojin Ding, David Qiu, David Rim, et al.
Abstract
End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memory usage and computational cost. Therefore, model compression is an important research topic to fit USM-based ASR under budget in real-world scenarios. In this study, we propose a USM fine-tuning approach for ASR, with a low-bit quantization and N:M structured sparsity aware paradigm on the model weights, reducing the model complexity from parameter precision and matrix topology perspectives. We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method. A series of ablation studies validate the effectiveness of up to int4 quantization and 2:4 sparsity. However, a single compression technique fails to recover the performance well under extreme setups including int2 qu
Authors
(none)
Tags
Stats
Related papers
- A Model For Every User And Budget: Label-free And Personalized Mixed-precision Quantization (2023)0.00
- Stablequant: Layer Adaptive Post-training Quantization For Speech Foundation Models (2025)2.26
- Quantization Of Acoustic Model Parameters In Automatic Speech Recognition Framework (2020)0.00
- Shrinkml: End-to-end ASR Model Compression Using Reinforcement Learning (2019)9.41
- Accurate And Structured Pruning For Efficient Automatic Speech Recognition (2023)7.81
- Towards Lightweight Speaker Verification Via Adaptive Neural Network Quantization (2024)5.84
- UME: Upcycling Mixture-of-experts For Scalable And Efficient Automatic Speech Recognition (2024)2.26
- Model Compression For Dnn-based Speaker Verification Using Weight Quantization (2022)3.58