HMM Vs. CTC For Automatic Speech Recognition: Comparison Based On Full-sum Training From Scratch
2022 Β· Tina Raissi, Wei Zhou, Simon Berger, et al.
Abstract
In this work, we compare from-scratch sequence-level cross-entropy (full-sum) training of Hidden Markov Model (HMM) and Connectionist Temporal Classification (CTC) topologies for automatic speech recognition (ASR). Besides accuracy, we further analyze their capability for generating high-quality time alignment between the speech signal and the transcription, which can be crucial for many subsequent applications. Moreover, we propose several methods to improve convergence of from-scratch full-sum training by addressing the alignment modeling issue. Systematic comparison is conducted on both Switchboard and LibriSpeech corpora across CTC, posterior HMM with and w/o transition probabilities, and standard hybrid HMM. We also provide a detailed analysis of both Viterbi forced-alignment and Baum-Welch full-sum occupation probabilities.
Authors
(none)
Tags
Stats
Related papers
- Investigating The Effect Of Label Topology And Training Criterion On ASR Performance And Alignment Quality (2024)0.00
- Full-sum Decoding For Hybrid HMM Based Speech Recognition Using LSTM Language Model (2020)0.00
- On Lattice-free Boosted MMI Training Of HMM And Ctc-based Full-context ASR Models (2021)7.81
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Ctc-segmentation Of Large Corpora For German End-to-end Speech Recognition (2020)12.93
- Comparison Of Decoding Strategies For CTC Acoustic Models (2017)10.48
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00