Efficient Minimum Word Error Rate Training Of Rnn-transducer For End-to-end Speech Recognition
2020 Β· Jinxi Guo, Gautam Tiwari, Jasha Droppo, et al.
Abstract
In this work, we propose a novel and efficient minimum word error rate (MWER) training method for RNN-Transducer (RNN-T). Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists. The hypothesis probability scores and back-propagated gradients are calculated efficiently using the forward-backward algorithm. Moreover, the proposed method allows us to decouple the decoding and training processes, and thus we can perform offline parallel-decoding and MWER training for each subset iteratively. Experimental results show that this proposed semi-on-the-fly method can speed up the on-the-fly method by 6 times and result in a similar WER improvement (3.6%) over a baseline RNN-T model. The proposed MWER training can also effectively reduce high-deletion errors (9.2% W
Authors
(none)
Tags
Stats
Related papers
- Minimum Bayes Risk Training Of Rnn-transducer For End-to-end Speech Recognition (2019)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- On Minimum Word Error Rate Training Of The Hybrid Autoregressive Transducer (2020)4.52
- Exploring Pre-training With Alignments For RNN Transducer Based End-to-end Speech Recognition (2020)9.41
- Multitask Learning And Joint Optimization For Transformer-rnn-transducer Speech Recognition (2020)8.09
- Exploring Rnn-transducer For Chinese Speech Recognition (2018)9.23
- Multiple-hypothesis RNN-T Loss For Unsupervised Fine-tuning And Self-training Of Neural Transducer (2022)0.00
- Alignment Restricted Streaming Recurrent Neural Network Transducer (2020)11.19