Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning
2023 Β· Guanrou Yang, Ziyang Ma, Zhisheng Zheng, et al.
Abstract
Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work.
Authors
(none)
Tags
Stats
Related papers
- Fithubert: Going Thinner And Deeper For Knowledge Distillation Of Speech Self-supervised Learning (2022)10.97
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Multi-resolution Hubert: Multi-resolution Speech Self-supervised Learning With Masked Unit Prediction (2023)0.00
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Melhubert: A Simplified Hubert On Mel Spectrograms (2022)3.70
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- Ms-hubert: Mitigating Pre-training And Inference Mismatch In Masked Language Modelling Methods For Learning Speech Representations (2024)4.52
- Lighthubert: Lightweight And Configurable Speech Representation Learning With Once-for-all Hidden-unit BERT (2022)15.51