CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition
2024 Β· Zengwei Yao, Wei Kang, Xiaoyu Yang, et al.
Abstract
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. We provide in-depth insights into its essential behaviors from three perspectives: 1) it conducts self-distillation between random pairs of sub-models that process different augmented views; 2) it learns contextual representation through masked prediction for positions within time-masked regions, especially when we increase the amount of time masking; 3) it suppresses the extremely peaky CTC distributions, thereby reducing overfitting and improving the generalization ability. Extensive experiments on LibriSpeech, Aishell-1, and GigaSpeech datasets demonstrate the effec
Authors
(none)
Tags
Stats
Related papers
- Adamer-ctc: Connectionist Temporal Classification With Adaptive Maximum Entropy Regularization For Automatic Speech Recognition (2024)5.84
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Efficient CTC Regularization Via Coarse Labels For End-to-end Speech Translation (2023)0.00
- Non-autoregressive Error Correction For Ctc-based ASR With Phone-conditioned Masked LM (2022)5.84