Bayesian Learning Of LF-MMI Trained Time Delay Neural Networks For Speech Recognition
2020 Β· Shoukang Hu, Xurong Xie, Shansong Liu, et al.
Abstract
Discriminative training techniques define state-of-the-art performance for automatic speech recognition systems. However, they are inherently prone to overfitting, leading to poor generalization performance when using limited training data. In order to address this issue, this paper presents a full Bayesian framework to account for model uncertainty in sequence discriminative training of factored TDNN acoustic models. Several Bayesian learning based TDNN variant systems are proposed to model the uncertainty over weight parameters and choices of hidden activation functions, or the hidden layer outputs. Efficient variational inference approaches using a few as one single parameter sample ensure their computational cost in both training and evaluation time comparable to that of the baseline TDNN systems. Statistically significant word error rate (WER) reductions of 0.4%-1.8% absolute (5%-11% relative) were obtained over a state-of-the-art 900 hour speed perturbed Switchboard corpus traine
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Lattice-free Discriminative Training Criteria For Purely Sequence-trained Neural Network Acoustic Models (2018)4.52
- Bayesspeech: A Bayesian Transformer Network For Automatic Speech Recognition (2023)0.00
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- A Novel Pyramidal-fsmn Architecture With Lattice-free MMI For Speech Recognition (2018)0.00
- Unsupervised Model-based Speaker Adaptation Of End-to-end Lattice-free MMI Model For Speech Recognition (2022)2.26
- Minimum Bayes Risk Training Of Rnn-transducer For End-to-end Speech Recognition (2019)0.00
- MFA: TDNN With Multi-scale Frequency-channel Attention For Text-independent Speaker Verification With Short Utterances (2022)13.79
- Simplified End-to-end MMI Training And Voting For ASR (2017)0.00