SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning
2024 Β· Luca Zampierin, Ghouthi Boukli Hacene, Bac Nguyen, et al.
Abstract
Self-supervised learning (SSL) has achieved remarkable success across various speech-processing tasks. To enhance its efficiency, previous works often leverage the use of compression techniques. A notable recent attempt is DPHuBERT, which applies joint knowledge distillation (KD) and structured pruning to learn a significantly smaller SSL model. In this paper, we contribute to this research domain by introducing SKILL, a novel method that conducts distillation across groups of layers instead of distilling individual arbitrarily selected layers within the teacher network. The identification of the layers to distill is achieved through a hierarchical clustering procedure applied to layer similarity measures. Extensive experiments demonstrate that our distilled version of WavLM Base+ not only outperforms DPHuBERT but also achieves state-of-the-art results in the 30M parameters model class across several SUPERB tasks.
Authors
(none)
Tags
Stats
Related papers
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Deep Versus Wide: An Analysis Of Student Architectures For Task-agnostic Knowledge Distillation Of Self-supervised Speech Models (2022)9.23
- One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification (2023)7.81
- Synergistic Effects Of Knowledge Distillation And Structured Pruning For Self-supervised Speech Models (2025)0.00
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Recycle-and-distill: Universal Compression Strategy For Transformer-based Speech SSL Models With Attention Map Reusing And Masking Distillation (2023)5.84
- Star: Distilling Speech Temporal Relation For Lightweight Speech Self-supervised Learning Models (2023)5.24
- Distilhubert: Speech Representation Learning By Layer-wise Distillation Of Hidden-unit BERT (2021)15.06