VSEGAN: Visual Speech Enhancement Generative Adversarial Network
2021 Β· Xinmeng Xu, Yang Wang, Dongxiang Xu, et al.
Abstract
Speech enhancement is an essential task of improving speech quality in noise scenario. Several state-of-the-art approaches have introduced visual information for speech enhancement,since the visual aspect of speech is essentially unaffected by acoustic environment. This paper proposes a novel frameworkthat involves visual information for speech enhancement, by in-corporating a Generative Adversarial Network (GAN). In par-ticular, the proposed visual speech enhancement GAN consistof two networks trained in adversarial manner, i) a generator that adopts multi-layer feature fusion convolution network to enhance input noisy speech, and ii) a discriminator that attemptsto minimize the discrepancy between the distributions of the clean speech signal and enhanced speech signal. Experiment re-sults demonstrated superior performance of the proposed modelagainst several state-of-the-art
Authors
(none)
Tags
Stats
Related papers
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Video-driven Speech Reconstruction Using Generative Adversarial Networks (2019)11.39
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks (2018)12.33
- DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network For Speech Enhancement (2020)0.00
- SEFGAN: Harvesting The Power Of Normalizing Flows And Gans For Efficient High-quality Speech Enhancement (2023)5.84