Efficient Acoustic Feature Transformation In Mismatched Environments Using A Guided-gan
2022 Β· Walter Heymans, Marelie H. Davel, Charl van Heerden
Abstract
We propose a new framework to improve automatic speech recognition (ASR) systems in resource-scarce environments using a generative adversarial network (GAN) operating on acoustic input features. The GAN is used to enhance the features of mismatched data prior to decoding, or can optionally be used to fine-tune the acoustic model. We achieve improvements that are comparable to multi-style training (MTR), but at a lower computational cost. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER). Experiments demonstrate that the framework can be very useful in under-resourced environments where training data and computational resources are limited. The GAN does not require parallel training data, because it utilises a baseline acoustic model to provide an additional loss term that guides the generator to create acoustic features that are better classified by the ba
Authors
(none)
Tags
Stats
Related papers
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)16.14
- Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks (2021)5.84
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Investigating Generative Adversarial Networks Based Speech Dereverberation For Robust Speech Recognition (2018)10.74