Adamer-ctc: Connectionist Temporal Classification With Adaptive Maximum Entropy Regularization For Automatic Speech Recognition
2024 Β· Soohwan Eom, Eunseop Yoon, Hee Suk Yoon, et al.
Abstract
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.
Authors
(none)
Tags
Stats
Related papers
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Non-autoregressive Error Correction For Ctc-based ASR With Phone-conditioned Masked LM (2022)5.84
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Audio Adversarial Examples For Robust Hybrid Ctc/attention Speech Recognition (2020)3.58
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00