Minimum Word Error Rate Training For Attention-based Sequence-to-sequence Models
2017 Β· Rohit Prabhavalkar, Tara N. Sainath, Yonghui Wu, et al.
Abstract
Sequence-to-sequence models, such as attention-based models in automatic speech recognition (ASR), are typically trained to optimize the cross-entropy criterion which corresponds to improving the log-likelihood of the data. However, system performance is usually measured in terms of word error rate (WER), not log-likelihood. Traditional ASR systems benefit from discriminative sequence training which optimizes criteria such as the state-level minimum Bayes risk (sMBR) which are more closely related to WER. In the present work, we explore techniques to train attention-based models to directly minimize expected word error rate. We consider two loss functions which approximate the expected number of word errors: either by sampling from the model, or by using N-best lists of decoded hypotheses, which we find to be more effective than the sampling-based method. In experimental evaluations, we find that the proposed training procedure improves performance by up to 8.2% relative to the baselin
Authors
(none)
Tags
Stats
Related papers
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Minimum Bayes Risk Training For End-to-end Speaker-attributed ASR (2020)0.00
- Towards Better Decoding And Language Model Integration In Sequence To Sequence Models (2016)15.67
- Efficient Sequence Training Of Attention Models Using Approximative Recombination (2021)3.58
- On Minimum Word Error Rate Training Of The Hybrid Autoregressive Transducer (2020)4.52
- Efficient Minimum Word Error Rate Training Of Rnn-transducer For End-to-end Speech Recognition (2020)11.19
- Automatic Speech Recognition System-independent Word Error Rate Estimation (2024)3.58
- Supervised Attention In Sequence-to-sequence Models For Speech Recognition (2022)5.84