An Empirical Study Of Language Model Integration For Transducer Based Speech Recognition
2022 Β· Huahuan Zheng, Keyu An, Zhijian Ou, et al.
Abstract
Utilizing text-only data with an external language model (ELM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and internal language model estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned internal language model (ILM) prior, in order to integrate the ELM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained neural language model with full context, which may be inappropriate for the estimation of ILM and deteriorate the integration performance. Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English
Authors
(none)
Tags
Stats
Related papers
- A Density Ratio Approach To Language Model Fusion In End-to-end Automatic Speech Recognition (2020)0.00
- On Language Model Integration For RNN Transducer Based Speech Recognition (2021)9.59
- Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020)13.44
- Internal Language Model Training For Domain-adaptive End-to-end Speech Recognition (2021)11.39
- Integrating Text Inputs For Training And Adapting RNN Transducer ASR Models (2022)9.59
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Internal Language Model Estimation Based Adaptive Language Model Fusion For Domain Adaptation (2022)0.00