An Analysis Of Incorporating An External Language Model Into A Sequence-to-sequence Model
2017 Β· Anjuli Kannan, Yonghui Wu, Patrick Nguyen, et al.
Abstract
Attention-based sequence-to-sequence models for automatic speech recognition jointly train an acoustic model, language model, and alignment mechanism. Thus, the language model component is only trained on transcribed audio-text pairs. This leads to the use of shallow fusion with an external language model at inference time. Shallow fusion refers to log-linear interpolation with a separately trained language model at each step of the beam search. In this work, we investigate the behavior of shallow fusion across a range of conditions: different types of language models, different decoding units, and different tasks. On Google Voice Search, we demonstrate that the use of shallow fusion with a neural LM with wordpieces yields a 9.1% relative word error rate reduction (WERR) over our competitive attention-based sequence-to-sequence model, obviating the need for second-pass rescoring.
Authors
(none)
Tags
Stats
Related papers
- Learn Spelling From Teachers: Transferring Knowledge From Language Models To Sequence-to-sequence Speech Recognition (2019)9.76
- Language Model Integration Based On Memory Control For Sequence To Sequence Speech Recognition (2018)2.26
- Transfer Learning Of Language-independent End-to-end ASR With Language Model Fusion (2018)0.00
- Delayed Fusion: Integrating Large Language Models Into First-pass Decoding In End-to-end Speech Recognition (2025)5.84
- Towards Better Decoding And Language Model Integration In Sequence To Sequence Models (2016)15.67
- Iterative Shallow Fusion Of Backward Language Model For End-to-end Speech Recognition (2023)2.26
- On Language Model Integration For RNN Transducer Based Speech Recognition (2021)9.59
- Multilingual And Fully Non-autoregressive ASR With Large Language Model Fusion: A Comprehensive Study (2024)0.00