End-to-end Adaptation With Backpropagation Through WFST For On-device Speech Recognition System
2019 Β· Emiru Tsunoo, Yosuke Kashiwagi, Satoshi Asakawa, et al.
Abstract
An on-device DNN-HMM speech recognition system efficiently works with a limited vocabulary in the presence of a variety of predictable noise. In such a case, vocabulary and environment adaptation is highly effective. In this paper, we propose a novel method of end-to-end (E2E) adaptation, which adjusts not only an acoustic model (AM) but also a weighted finite-state transducer (WFST). We convert a pretrained WFST to a trainable neural network and adapt the system to target environments/vocabulary by E2E joint training with an AM. We replicate Viterbi decoding with forward--backward neural network computation, which is similar to recurrent neural networks (RNNs). By pooling output score sequences, a vocabulary posterior for each utterance is obtained and used for discriminative loss computation. Experiments using 2--10 hours of English/Japanese adaptation datasets indicate that the fine-tuning of only WFSTs and that of only AMs are both comparable to a state-of-the-art adaptation method
Authors
(none)
Tags
Stats
Related papers
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Cumulative Adaptation For BLSTM Acoustic Models (2019)0.00
- Run-time Adaptation Of Neural Beamforming For Robust Speech Dereverberation And Denoising (2024)0.00
- Fast And Accurate Factorized Neural Transducer For Text Adaption Of End-to-end Speech Recognition Models (2022)0.00
- A Review Of On-device Fully Neural End-to-end Automatic Speech Recognition Algorithms (2020)9.92
- WNARS: WFST Based Non-autoregressive Streaming End-to-end Speech Recognition (2021)0.00
- Generative Adversarial Network Based Speaker Adaptation For High Fidelity Wavenet Vocoder (2018)5.84
- Incremental Layer-wise Self-supervised Learning For Efficient Speech Domain Adaptation On Device (2021)5.84