Relaxing The Conditional Independence Assumption Of Ctc-based ASR By Conditioning On Intermediate Predictions
2021 Β· Jumon Nozaki, Tatsuya Komatsu
Abstract
This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models. We train a CTC-based ASR model with auxiliary CTC losses in intermediate layers in addition to the original CTC loss in the last layer. During both training and inference, each generated prediction in the intermediate layers is summed to the input of the next layer to condition the prediction of the last layer on those intermediate predictions. Our method is easy to implement and retains the merits of CTC-based ASR: a simple model architecture and fast decoding speed. We conduct experiments on three different ASR corpora. Our proposed method improves a standard CTC model significantly (e.g., more than 20 % relative word error rate reduction on the WSJ corpus) with a little computational overhead. Moreover, for the TEDLIUM2 corpus and the AISHELL-1 corpus, it achieves a comparable performance to a strong autoregress
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Conditional End-to-end ASR With CTC And Multi-granular Subword Units (2021)9.23
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Alternate Intermediate Conditioning With Syllable-level And Character-level Targets For Japanese ASR (2022)0.00
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Non-autoregressive Error Correction For Ctc-based ASR With Phone-conditioned Masked LM (2022)5.84
- Reducing Spelling Inconsistencies In Code-switching ASR Using Contextualized CTC Loss (2020)4.52
- Adamer-ctc: Connectionist Temporal Classification With Adaptive Maximum Entropy Regularization For Automatic Speech Recognition (2024)5.84