Distillation And Pruning For Scalable Self-supervised Representation-based Speech Quality Assessment
2025 Β· Benjamin Stahl, Hannes Gamper
Abstract
In this paper, we investigate distillation and pruning methods to reduce model size for non-intrusive speech quality assessment based on self-supervised representations. Our experiments build on XLS-R-SQA, a speech quality assessment model using wav2vec 2.0 XLS-R embeddings. We retrain this model on a large compilation of mean opinion score datasets, encompassing over 100,000 labeled clips. For distillation, using this model as a teacher, we generate pseudo-labels on unlabeled degraded speech signals and train student models of varying sizes. For pruning, we use a data-driven strategy. While data-driven pruning performs better at larger model sizes, distillation on unlabeled data is more effective for smaller model sizes. Distillation can halve the gap between the baseline's correlation with ground-truth MOS labels and that of the XLS-R-based teacher model, while reducing model size by two orders of magnitude compared to the teacher model.
Authors
(none)
Tags
Stats
Related papers
- Structured Pruning Of Self-supervised Pre-trained Models For Speech Recognition And Understanding (2023)11.39
- Pre-trained Speech Representations As Feature Extractors For Speech Quality Assessment In Online Conferencing Applications (2022)5.84
- Synergistic Effects Of Knowledge Distillation And Structured Pruning For Self-supervised Speech Models (2025)0.00
- On The Impact Of Quantization And Pruning Of Self-supervised Speech Models For Downstream Speech Recognition Tasks "in-the-wild'' (2023)0.00
- Accurate And Structured Pruning For Efficient Automatic Speech Recognition (2023)7.81
- Noise Robust Distillation Of Self-supervised Speech Models Via Correlation Metrics (2023)2.26
- More For Less: Non-intrusive Speech Quality Assessment With Limited Annotations (2021)7.16
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26