Distilhubert: Speech Representation Learning By Layer-wise Distillation Of Hidden-unit BERT
2021 Β· Heng-Jui Chang, Shu-Wen Yang, Hung-Yi Lee
Abstract
Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.
Authors
(none)
Tags
Stats
Related papers
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Gendistiller: Distilling Pre-trained Language Models Based On An Autoregressive Generative Model (2024)2.26
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- Lighthubert: Lightweight And Configurable Speech Representation Learning With Once-for-all Hidden-unit BERT (2022)15.51
- Hubert: Self-supervised Speech Representation Learning By Masked Prediction Of Hidden Units (2021)25.30
- Gendistiller: Distilling Pre-trained Language Models Based On Generative Models (2023)0.00
- Recycle-and-distill: Universal Compression Strategy For Transformer-based Speech SSL Models With Attention Map Reusing And Masking Distillation (2023)5.84
- Star: Distilling Speech Temporal Relation For Lightweight Speech Self-supervised Learning Models (2023)5.24