Consistent Training And Decoding For End-to-end Speech Recognition Using Lattice-free MMI
2021 Β· Jinchuan Tian, Jianwei Yu, Chao Weng, et al.
Abstract
Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks. However, Lattice-Free Maximum Mutual Information (LF-MMI), as one of the discriminative training criteria that show superior performance in hybrid ASR systems, is rarely adopted in E2E ASR frameworks. In this work, we propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages. The proposed approach shows its effectiveness on two of the most widely used E2E frameworks including Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements on various datasets and different E2E ASR frameworks. The best of our models achieves competitive CER of 4.1% / 4.4% on Aishell-1 dev/test set; we also achieve significant error reduction on Aishell-2 and Librispeech datasets over strong baseli
Authors
(none)
Tags
Stats
Related papers
- On Lattice-free Boosted MMI Training Of HMM And Ctc-based Full-context ASR Models (2021)7.81
- Unsupervised Model-based Speaker Adaptation Of End-to-end Lattice-free MMI Model For Speech Recognition (2022)2.26
- A Comparison Of Lattice-free Discriminative Training Criteria For Purely Sequence-trained Neural Network Acoustic Models (2018)4.52
- Simplified End-to-end MMI Training And Voting For ASR (2017)0.00
- Comparison Of Lattice-free And Lattice-based Sequence Discriminative Training Criteria For LVCSR (2019)5.84
- A Novel Pyramidal-fsmn Architecture With Lattice-free MMI For Speech Recognition (2018)0.00
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Lattice-based Lightly-supervised Acoustic Model Training (2019)0.00