Wav2vec-s: Semi-supervised Pre-training For Low-resource ASR
2021 Β· Han Zhu, Li Wang, Jindong Wang, et al.
Abstract
Self-supervised pre-training could effectively improve the performance of low-resource automatic speech recognition (ASR). However, existing self-supervised pre-training are task-agnostic, i.e., could be applied to various downstream tasks. Although it enlarges the scope of its application, the capacity of the pre-trained model is not fully utilized for the ASR task, and the learned representations may not be optimal for ASR. In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR. Experiments show that compared to wav2vec 2.0, wav2vec-S only requires a marginal increment of pre-training time but could significantly improve ASR performance on in-domain, cross-domain and cros
Authors
(none)
Tags
Stats
Related papers
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Improving Low-resource Speech Recognition With Pretrained Speech Models: Continued Pretraining Vs. Semi-supervised Training (2022)0.00
- Performance-efficiency Trade-offs In Unsupervised Pre-training For Speech Recognition (2021)0.00
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- Wav2seq: Pre-training Speech-to-text Encoder-decoder Models Using Pseudo Languages (2022)10.48
- Robust Wav2vec 2.0: Analyzing Domain Shift In Self-supervised Pre-training (2021)25.07