Intermpl: Momentum Pseudo-labeling With Intermediate CTC Loss
2022 Β· Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, et al.
Abstract
This paper presents InterMPL, a semi-supervised learning method of end-to-end automatic speech recognition (ASR) that performs pseudo-labeling (PL) with intermediate supervision. Momentum PL (MPL) trains a connectionist temporal classification (CTC)-based model on unlabeled data by continuously generating pseudo-labels on the fly and improving their quality. In contrast to autoregressive formulations, such as the attention-based encoder-decoder and transducer, CTC is well suited for MPL, or PL-based semi-supervised ASR in general, owing to its simple/fast inference algorithm and robustness against generating collapsed labels. However, CTC generally yields inferior performance than the autoregressive models due to the conditional independence assumption, thereby limiting the performance of MPL. We propose to enhance MPL by introducing intermediate loss, inspired by the recent advances in CTC-based modeling. Specifically, we focus on self-conditional and hierarchical conditional CTC, tha
Authors
(none)
Tags
Stats
Related papers
- Advancing Momentum Pseudo-labeling With Conformer And Initialization Strategy (2021)6.34
- Slimipl: Language-model-free Iterative Pseudo-labeling (2020)10.74
- Multi-task Pseudo-label Learning For Non-intrusive Speech Quality Assessment Model (2023)0.00
- Alternative Pseudo-labeling For Semi-supervised Automatic Speech Recognition (2023)10.48
- Improving Mispronunciation Detection With Wav2vec2-based Momentum Pseudo-labeling For Accentedness And Intelligibility Assessment (2022)7.16
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019)0.00