LSTM-LM With Long-term History For First-pass Decoding In Conversational Speech Recognition
2020 Β· Xie Chen, Sarangarajan Parthasarathy, William Gale, et al.
Abstract
LSTM language models (LSTM-LMs) have been proven to be powerful and yielded significant performance improvements over count based n-gram LMs in modern speech recognition systems. Due to its infinite history states and computational load, most previous studies focus on applying LSTM-LMs in the second-pass for rescoring purpose. Recent work shows that it is feasible and computationally affordable to adopt the LSTM-LMs in the first-pass decoding within a dynamic (or tree based) decoder framework. In this work, the LSTM-LM is composed with a WFST decoder on-the-fly for the first-pass decoding. Furthermore, motivated by the long-term history nature of LSTM-LMs, the use of context beyond the current utterance is explored for the first-pass decoding in conversational speech recognition. The context information is captured by the hidden states of LSTM-LMs across utterance and can be used to guide the first-pass search effectively. The experimental results in our internal meeting transcription
Authors
(none)
Tags
Stats
Related papers
- Full-sum Decoding For Hybrid HMM Based Speech Recognition Using LSTM Language Model (2020)0.00
- Context-aware RNNLM Rescoring For Conversational Speech Recognition (2020)4.52
- Transformer Language Models With Lstm-based Cross-utterance Information Representation (2021)10.48
- Language Modeling With Highway LSTM (2017)10.21
- Large Language Model Can Transcribe Speech In Multi-talker Scenarios With Versatile Instructions (2024)11.23
- Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers (2021)10.07
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- End-to-end Speech Recognition Contextualization With Large Language Models (2023)0.00