Censer: Curriculum Semi-supervised Learning For Speech Recognition Based On Self-supervised Pre-training
2022 Β· Bowen Zhang, Songjun Cao, Xiaoming Zhang, et al.
Abstract
Recent studies have shown that the benefits provided by self-supervised pre-training and self-training (pseudo-labeling) are complementary. Semi-supervised fine-tuning strategies under the pre-training framework, however, remain insufficiently studied. Besides, modern semi-supervised speech recognition algorithms either treat unlabeled data indiscriminately or filter out noisy samples with a confidence threshold. The dissimilarities among different unlabeled data are often ignored. In this paper, we propose Censer, a semi-supervised speech recognition algorithm based on self-supervised pre-training to maximize the utilization of unlabeled data. The pre-training stage of Censer adopts wav2vec2.0 and the fine-tuning stage employs an improved semi-supervised learning algorithm from slimIPL, which leverages unlabeled data progressively according to their pseudo labels' qualities. We also incorporate a temporal pseudo label pool and an exponential moving average to control the pseudo labels
Authors
(none)
Tags
Stats
Related papers
- Self-training And Pre-training Are Complementary For Speech Recognition (2020)14.15
- Improving Low-resource Speech Recognition With Pretrained Speech Models: Continued Pretraining Vs. Semi-supervised Training (2022)0.00
- Self-training For End-to-end Speech Recognition (2019)15.48
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019)0.00
- Wav2vec-s: Semi-supervised Pre-training For Low-resource ASR (2021)7.50
- Weakly-supervised Speech Pre-training: A Case Study On Target Speech Recognition (2023)8.09
- SLICER: Learning Universal Audio Representations Using Low-resource Self-supervised Pre-training (2022)0.00