Deep Versus Wide: An Analysis Of Student Architectures For Task-agnostic Knowledge Distillation Of Self-supervised Speech Models
2022 Β· Takanori Ashihara, Takafumi Moriya, Kohei Matsuura, et al.
Abstract
Self-supervised learning (SSL) is seen as a very promising approach with high performance for several speech downstream tasks. Since the parameters of SSL models are generally so large that training and inference require a lot of memory and computational cost, it is desirable to produce compact SSL models without a significant performance degradation by applying compression methods such as knowledge distillation (KD). Although the KD approach is able to shrink the depth and/or width of SSL model structures, there has been little research on how varying the depth and width impacts the internal representation of the small-footprint model. This paper provides an empirical study that addresses the question. We investigate the performance on SUPERB while varying the structure and KD methods so as to keep the number of parameters constant; this allows us to analyze the contribution of the representation introduced by varying the model architecture. Experiments demonstrate that a certain dept
Authors
(none)
Tags
Stats
Related papers
- SKILL: Similarity-aware Knowledge Distillation For Speech Self-supervised Learning (2024)3.58
- One-step Knowledge Distillation And Fine-tuning In Using Large Pre-trained Self-supervised Learning Models For Speaker Verification (2023)7.81
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Synergistic Effects Of Knowledge Distillation And Structured Pruning For Self-supervised Speech Models (2025)0.00
- Application Of Knowledge Distillation To Multi-task Speech Representation Learning (2022)2.26
- Recycle-and-distill: Universal Compression Strategy For Transformer-based Speech SSL Models With Attention Map Reusing And Masking Distillation (2023)5.84
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Understanding Self-supervised Learning Of Speech Representation Via Invariance And Redundancy Reduction (2023)0.00