Non-autoregressive Error Correction For Ctc-based ASR With Phone-conditioned Masked LM
2022 Β· Hayato Futami, Hirofumi Inaguma, Sei Ueno, et al.
Abstract
Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature. To take advantage of text-only data, language model (LM) integration approaches such as rescoring and shallow fusion have been widely used for CTC. However, they lose CTC's non-autoregressive nature because of the need for beam search, which slows down the inference speed. In this study, we propose an error correction method with phone-conditioned masked LM (PC-MLM). In the proposed method, less confident word tokens in a greedy decoded output from CTC are masked. PC-MLM then predicts these masked word tokens given unmasked words and phones supplementally predicted from CTC. We further extend it to Deletable PC-MLM in order to address insertion errors. Since both CTC and PC-MLM are non-autoregressive models, the method enables fast LM integration. Experimental evaluations on the Corpus of Spontaneous Japanese (CSJ) and TED-LIUM2 in d
Authors
(none)
Tags
Stats
Related papers
- Acoustic-aware Non-autoregressive Spell Correction With Mask Sample Decoding (2022)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Joint Masked CPC And CTC Training For ASR (2020)8.60
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Mask The Bias: Improving Domain-adaptive Generalization Of Ctc-based ASR With Internal Language Model Estimation (2023)3.58
- Chain Of Correction For Full-text Speech Recognition With Large Language Models (2025)0.00
- Adamer-ctc: Connectionist Temporal Classification With Adaptive Maximum Entropy Regularization For Automatic Speech Recognition (2024)5.84