Outlier Reduction With Gated Attention For Improved Post-training Quantization In Large Sequence-to-sequence Speech Foundation Models
2024 Β· Dominik Wagner, Ilja Baumann, Korbinian Riedhammer, et al.
Abstract
This paper explores the improvement of post-training quantization (PTQ) after knowledge distillation in the Whisper speech foundation model family. We address the challenge of outliers in weights and activation tensors, known to impede quantization quality in transformer-based language and vision models. Extending this observation to Whisper, we demonstrate that these outliers are also present when transformer-based models are trained to perform automatic speech recognition, necessitating mitigation strategies for PTQ. We show that outliers can be reduced by a recently proposed gating mechanism in the attention blocks of the student model, enabling effective 8-bit quantization, and lower word error rates compared to student models without the gating mechanism in place.
Authors
(none)
Tags
Stats
Related papers
- Stablequant: Layer Adaptive Post-training Quantization For Speech Foundation Models (2025)2.26
- Dq-whisper: Joint Distillation And Quantization For Efficient Multilingual Speech Recognition (2023)4.52
- PSST! Prosodic Speech Segmentation With Transformers (2023)3.58
- Mixed Precision Of Quantization Of Transformer Language Models For Speech Recognition (2021)8.09
- Generative Models For Improved Naturalness, Intelligibility, And Voicing Of Whispered Speech (2022)6.34
- A Study On Zero-shot Non-intrusive Speech Assessment Using Large Language Models (2024)5.84
- PTQ4ADM: Post-training Quantization For Efficient Text Conditional Audio Diffusion Models (2024)0.00
- Fine-tuning Whisper On Low-resource Languages For Real-world Applications (2024)0.00