Deep LSTM For Large Vocabulary Continuous Speech Recognition
2017 Β· Xu Tian, Jun Zhang, Zejun Ma, et al.
Abstract
Recurrent neural networks (RNNs), especially long short-term memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because of their impressive learning ability. However, it is more difficult to train a deeper network. We introduce a training framework with layer-wise training and exponential moving average methods for deeper LSTM models. It is a competitive framework that LSTM models of more than 7 layers are successfully trained on Shenma voice search data in Mandarin and they outperform the deep LSTM models trained by conventional approach. Moreover, in order for online streaming speech recognition applications, the shallow model with low real time factor is distilled from the very deep model. The recognition accuracy have little loss in the distillation process. Therefore, the model trained with the proposed training framework reduces relative 14% character error rate,
Authors
(none)
Tags
Stats
Related papers
- Long Short-term Memory Based Convolutional Recurrent Neural Networks For Large Vocabulary Speech Recognition (2016)6.77
- Exponential Moving Average Model In Parallel Speech Recognition Training (2017)0.00
- Neural Speech Recognizer: Acoustic-to-word LSTM Model For Large Vocabulary Speech Recognition (2016)15.16
- Exploring Rnn-transducer For Chinese Speech Recognition (2018)9.23
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- Frame Stacking And Retaining For Recurrent Neural Network Acoustic Model (2017)0.00
- Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020)0.00
- Recognizing Long-form Speech Using Streaming End-to-end Models (2019)13.74