Robust Speech Recognition Using Generative Adversarial Networks
2017 Β· Anuroop Sriram, Heewoo Jun, Yashesh Gaur, et al.
Abstract
This paper describes a general, scalable, end-to-end framework that uses the generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Unlike previous methods, the new framework does not rely on domain expertise or simplifying assumptions as are often needed in signal processing, and directly encourages robustness in a data-driven way. We show the new approach improves simulated far-field speech recognition of vanilla sequence-to-sequence models without specialized front-ends or preprocessing.
Authors
(none)
Tags
Stats
Related papers
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)16.14
- Investigating Generative Adversarial Networks Based Speech Dereverberation For Robust Speech Recognition (2018)10.74
- Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks (2021)5.84
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00
- Generative Adversarial Speaker Embedding Networks For Domain Robust End-to-end Speaker Verification (2018)0.00
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85