On Minimum Word Error Rate Training Of The Hybrid Autoregressive Transducer
2020 Β· Liang Lu, Zhong Meng, Naoyuki Kanda, et al.
Abstract
Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion. In HAT, the blank probability and the label probability are estimated using two separate probability distributions, which provides a more accurate solution for internal LM score estimation, and thus works better when combining with an external LM. Previous work mainly focuses on HAT model training with the negative log-likelihood loss, while in this paper, we study the minimum word error rate (MWER) training of HAT -- a criterion that is closer to the evaluation metric for speech recognition, and has been successfully applied to other types of end-to-end models such as sequence-to-sequence (S2S) and RNN-T models. From experiments with around 30,000 hours of training data, we show that MWER training can improve the accuracy of HAT models, while at the same time, improving
Authors
(none)
Tags
Stats
Related papers
- Modular Hybrid Autoregressive Transducer (2022)8.35
- Efficient Minimum Word Error Rate Training Of Rnn-transducer For End-to-end Speech Recognition (2020)11.19
- Boosting Hybrid Autoregressive Transducer-based ASR With Internal Acoustic Model Training And Dual Blank Thresholding (2024)2.26
- Minimum Word Error Rate Training For Attention-based Sequence-to-sequence Models (2017)14.35
- Minimum Bayes Risk Training Of Rnn-transducer For End-to-end Speech Recognition (2019)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- H_eval: A New Hybrid Evaluation Metric For Automatic Speech Recognition Tasks (2022)6.34
- Multiple-hypothesis RNN-T Loss For Unsupervised Fine-tuning And Self-training Of Neural Transducer (2022)0.00