Semi-supervised Speech Recognition Via Graph-based Temporal Classification
2020 Β· Niko Moritz, Takaaki Hori, Jonathan Le Roux
Abstract
Semi-supervised learning has demonstrated promising results in automatic speech recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for unlabeled data. The effectiveness of this approach largely relies on the pseudo-label accuracy, for which typically only the 1-best ASR hypothesis is used. However, alternative ASR hypotheses of an N-best list can provide more accurate labels for an unlabeled speech utterance and also reflect uncertainties of the seed ASR model. In this paper, we propose a generalized form of the connectionist temporal classification (CTC) objective that accepts a graph representation of the training labels. The newly proposed graph-based temporal classification (GTC) objective is applied for self-training with WFST-based supervision, which is generated from an N-best list of pseudo-labels. In this setup, GTC is used to learn not only a temporal alignment, similarly to CTC, but also a label alignment to obtain the optimal pseudo-label
Authors
(none)
Tags
Stats
Related papers
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition With Imperfect Transcripts (2023)7.50
- Star Temporal Classification: Sequence Classification With Partially Labeled Data (2022)3.58
- End-to-end ASR: From Supervised To Semi-supervised Learning With Modern Architectures (2019)0.00
- Joint Masked CPC And CTC Training For ASR (2020)8.60
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30