Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition
2017 Β· Chris Donahue, Bo Li, Rohit Prabhavalkar
Abstract
We investigate the effectiveness of generative adversarial networks (GANs) for speech enhancement, in the context of improving noise robustness of automatic speech recognition (ASR) systems. Prior work demonstrates that GANs can effectively suppress additive noise in raw waveform speech signals, improving perceptual quality metrics; however this technique was not justified in the context of ASR. In this work, we conduct a detailed study to measure the effectiveness of GANs in enhancing speech contaminated by both additive and reverberant noise. Motivated by recent advances in image processing, we propose operating GANs on log-Mel filterbank spectra instead of waveforms, which requires less computation and is more robust to reverberant noise. While GAN enhancement improves the performance of a clean-trained ASR system on noisy speech, it falls short of the performance achieved by conventional multi-style training (MTR). By appending the GAN-enhanced features to the noisy inputs and retr
Authors
(none)
Tags
Stats
Related papers
- Investigating Generative Adversarial Networks Based Speech Dereverberation For Robust Speech Recognition (2018)10.74
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Efficient Acoustic Feature Transformation In Mismatched Environments Using A Guided-gan (2022)2.26
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52