On Language Model Integration For RNN Transducer Based Speech Recognition
2021 · Wei Zhou, Zuoyun Zheng, Ralf Schlüter, et al.
Abstract
The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction, which is further experimentally verified with detailed analysis. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer, which enables a theoretical justification for other ILM approaches. Systematic comparison is conducted for both in-domain and cross-domain evaluation on the Librispeech and TED-LIUM Release 2 corpora, respectively. Our proposed exact-ILM training can further improve the best ILM method.
Authors
(none)
Tags
Stats
Related papers
- An Empirical Study Of Language Model Integration For Transducer Based Speech Recognition (2022)3.58
- On The Relation Between Internal Language Model And Sequence Discriminative Training For Neural Transducers (2023)0.00
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Internal Language Model Training For Domain-adaptive End-to-end Speech Recognition (2021)11.39
- Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020)13.44
- Investigating Methods To Improve Language Model Integration For Attention-based Encoder-decoder ASR Models (2021)0.00
- Transducer-llama: Integrating Llms Into Streamable Transducer-based Speech Recognition (2024)3.58
- Transformer Language Models With Lstm-based Cross-utterance Information Representation (2021)10.48