Lighthubert: Lightweight And Configurable Speech Representation Learning With Once-for-all Hidden-unit BERT
2022 Β· Rui Wang, Qibing Bai, Junyi Ao, et al.
Abstract
Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pruning structured parameters. More precisely, we create a Transformer-based supernet that is nested with thousands of weight-sharing subnets and design a two-stage distillation strategy to leverage the contextualized latent representations from HuBERT. Experiments on automatic speech recognition (ASR) and the SUPERB benchmark show the proposed LightHuBERT enables over \(10^9\) architectures concerning the embedding dimension, attention dimension, head number, feed-forward network ratio, and network depth. LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT s
Authors
(none)
Tags
Stats
Related papers
- Is Smaller Always Faster? Tradeoffs In Compressing Self-supervised Speech Transformers (2022)0.00
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- Distilhubert: Speech Representation Learning By Layer-wise Distillation Of Hidden-unit BERT (2021)15.06
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Melhubert: A Simplified Hubert On Mel Spectrograms (2022)3.70
- Star: Distilling Speech Temporal Relation For Lightweight Speech Self-supervised Learning Models (2023)5.24
- Multi-resolution Hubert: Multi-resolution Speech Self-supervised Learning With Masked Unit Prediction (2023)0.00
- Hubert: Self-supervised Speech Representation Learning By Masked Prediction Of Hidden Units (2021)25.30