Long Short-term Memory Based Convolutional Recurrent Neural Networks For Large Vocabulary Speech Recognition
2016 Β· Xiangang Li, Xihong Wu
Abstract
Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all sequence history. On the other hand, the convolutional neural networks (CNNs) have brought significant improvements to deep feed-forward neural networks (FFNNs), as they are able to better reduce spectral variation in the input signal. In this paper, a network architecture called as convolutional recurrent neural network (CRNN) is proposed by combining the CNN and LSTM RNN. In the proposed CRNNs, each speech frame, without adjacent context frames, is organized as a number of local feature patches along the frequency axis, and then a LSTM network is performed on each feature patch along the time axis. We train and compare FFNNs, LSTM RNNs and the proposed LSTM CRNNs at various number of configurations. Experimental results show that the LSTM CRNNs can exc
Authors
(none)
Tags
Stats
Related papers
- Deep LSTM For Large Vocabulary Continuous Speech Recognition (2017)14.58
- Analyzing Large Receptive Field Convolutional Networks For Distant Speech Recognition (2019)5.84
- Learning Compact Recurrent Neural Networks (2016)0.00
- Neural Speech Recognizer: Acoustic-to-word LSTM Model For Large Vocabulary Speech Recognition (2016)15.16
- Bidirectional Quaternion Long-short Term Memory Recurrent Neural Networks For Speech Recognition (2018)9.41
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks (2020)5.84
- Memory Visualization For Gated Recurrent Neural Networks In Speech Recognition (2016)11.76