Cumulative Adaptation For BLSTM Acoustic Models
2019 · Markus Kitza, Pavel Golik, Ralf Schlüter, et al.
Abstract
This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods. A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modelling. Further, i-vectors were used as an input to the neural network to perform instantaneous speaker and environment adaptation, providing 8% relative improvement in word error rate on the NIST Hub5 2000 evaluation test set. By enhancing the first-pass i-vector based adaptation with a second-pass adaptation using speaker and environment dependent transformations within the network, a further relative improvement of 5% in word error rate was achieved. We have reevaluated the features used to estimate i-vectors and their normalization to achieve the best performance in a modern large scale automatic speech recognition system.
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Adaptation Techniques And Recurrent Neural Network Architectures (2018)3.58
- Speaker Adaptation For End-to-end Speech Recognition Systems In Noisy Environments (2022)0.00
- End-to-end Adaptation With Backpropagation Through WFST For On-device Speech Recognition System (2019)5.24
- Bayesian Learning For Deep Neural Network Adaptation (2020)9.76
- Multi-staged Cross-lingual Acoustic Model Adaption For Robust Speech Recognition In Real-world Applications -- A Case Study On German Oral History Interviews (2020)0.00
- Layer-wise Fast Adaptation For End-to-end Multi-accent Speech Recognition (2022)9.76
- Factorised Speaker-environment Adaptive Training Of Conformer Speech Recognition Systems (2023)0.00
- A Unified Speaker Adaptation Method For Speech Synthesis Using Transcribed And Untranscribed Speech With Backpropagation (2019)0.00