Full-sum Decoding For Hybrid HMM Based Speech Recognition Using LSTM Language Model
2020 · Wei Zhou, Ralf Schlüter, Hermann Ney
Abstract
In hybrid HMM based speech recognition, LSTM language models have been widely applied and achieved large improvements. The theoretical capability of modeling any unlimited context suggests that no recombination should be applied in decoding. This motivates to reconsider full summation over the HMM-state sequences instead of Viterbi approximation in decoding. We explore the potential gain from more accurate probabilities in terms of decision making and apply the full-sum decoding with a modified prefix-tree search framework. The proposed full-sum decoding is evaluated on both Switchboard and Librispeech corpora. Different models using CE and sMBR training criteria are used. Additionally, both MAP and confusion network decoding as approximated variants of general Bayes decision rule are evaluated. Consistent improvements over strong baselines are achieved in almost all cases without extra cost. We also discuss tuning effort, efficiency and some limitations of full-sum decoding.
Authors
(none)
Tags
Stats
Related papers
- HMM Vs. CTC For Automatic Speech Recognition: Comparison Based On Full-sum Training From Scratch (2022)0.00
- Single Headed Attention Based Sequence-to-sequence Model For State-of-the-art Results On Switchboard (2020)0.00
- LSTM-LM With Long-term History For First-pass Decoding In Conversational Speech Recognition (2020)0.00
- On Lattice-free Boosted MMI Training Of HMM And Ctc-based Full-context ASR Models (2021)7.81
- Language Modeling With Highway LSTM (2017)10.21
- Delayed Fusion: Integrating Large Language Models Into First-pass Decoding In End-to-end Speech Recognition (2025)5.84
- Phoneme Based Neural Transducer For Large Vocabulary Speech Recognition (2020)9.59
- A Comparison Of Techniques For Language Model Integration In Encoder-decoder Speech Recognition (2018)14.39