A Novel Pyramidal-fsmn Architecture With Lattice-free MMI For Speech Recognition
2018 Β· Xuerui Yang, Jiwei Li, Xi Zhou
Abstract
Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks. Based on this work, we propose a novel network architecture which introduces pyramidal memory structure to represent various context information in different layers. Additionally, res-CNN layers are added in the front to extract more sophisticated features as well. Together with lattice-free maximum mutual information (LF-MMI) and cross entropy (CE) joint training criteria, experimental results show that this approach achieves word error rates (WERs) of 3.62% and 10.89% respectively on Librispeech and LDC97S62 (Switchboard 300 hours) corpora. Furthermore, Recurrent neural network language model (RNNLM) rescoring is applied and a WER of 2.97% is obtained on Librispeech.
Authors
(none)
Tags
Stats
Related papers
- Deep Feed-forward Sequential Memory Networks For Speech Synthesis (2018)5.84
- Consistent Training And Decoding For End-to-end Speech Recognition Using Lattice-free MMI (2021)8.35
- Bayesian Learning Of LF-MMI Trained Time Delay Neural Networks For Speech Recognition (2020)8.82
- DFSMN-SAN With Persistent Memory Model For Automatic Speech Recognition (2019)5.84
- Lattice Rescoring Strategies For Long Short Term Memory Language Models In Speech Recognition (2017)9.76
- Language Model Integration Based On Memory Control For Sequence To Sequence Speech Recognition (2018)2.26
- Bidirectional Quaternion Long-short Term Memory Recurrent Neural Networks For Speech Recognition (2018)9.41
- A Comparison Of Lattice-free Discriminative Training Criteria For Purely Sequence-trained Neural Network Acoustic Models (2018)4.52