Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System
2018 Β· Kyungmin Lee, Chiyoun Park, Namhoon Kim, et al.
Abstract
This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems. Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries. Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform. The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy. The performance of the proposed methods evaluated on LibriSpeech test sets indicates that the reduction in history vector precision improves the average recognition speed by 1.23 times with minimum degradation in accuracy. On the other hand, the CPU-GPU hybrid parallelization enables RNNLM based real-time recognition with a four times improvement in speed.
Authors
(none)
Tags
Stats
Related papers
- Applying GPGPU To Recurrent Neural Network Language Model Based Fast Network Search In The Real-time LVCSR (2020)2.26
- Improved Neural Language Model Fusion For Streaming Recurrent Neural Network Transducer (2020)8.82
- Developing RNN-T Models Surpassing High-performance Hybrid Models With Customization Capability (2020)13.28
- Fpga-based Low-power Speech Recognition With Recurrent Neural Networks (2016)13.50
- Lpcnet: Improving Neural Speech Synthesis Through Linear Prediction (2018)0.00
- On The Effectiveness Of Neural Text Generation Based Data Augmentation For Recognition Of Morphologically Rich Speech (2020)0.00
- Inference Skipping For More Efficient Real-time Speech Enhancement With Parallel Rnns (2022)10.35
- Lattice Rescoring Strategies For Long Short Term Memory Language Models In Speech Recognition (2017)9.76