Frame Stacking And Retaining For Recurrent Neural Network Acoustic Model
2017 Β· Xu Tian, Jun Zhang, Zejun Ma, et al.
Abstract
Frame stacking is broadly applied in end-to-end neural network training like connectionist temporal classification (CTC), and it leads to more accurate models and faster decoding. However, it is not well-suited to conventional neural network based on context-dependent state acoustic model, if the decoder is unchanged. In this paper, we propose a novel frame retaining method which is applied in decoding. The system which combined frame retaining with frame stacking could reduces the time consumption of both training and decoding. Long short-term memory (LSTM) recurrent neural networks (RNNs) using it achieve almost linear training speedup and reduces relative 41% real time factor (RTF). At the same time, recognition performance is no degradation or improves sightly on Shenma voice search dataset in Mandarin.
Authors
(none)
Tags
Stats
Related papers
- Gated Recurrent Unit Based Acoustic Modeling With Future Context (2018)7.16
- Deep LSTM For Large Vocabulary Continuous Speech Recognition (2017)14.58
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Memory Visualization For Gated Recurrent Neural Networks In Speech Recognition (2016)11.76
- Mandarin Tone Modeling Using Recurrent Neural Networks (2017)0.00
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)16.21
- High Order Recurrent Neural Networks For Acoustic Modelling (2018)8.60
- Speech Separation Using An Asynchronous Fully Recurrent Convolutional Neural Network (2021)0.00