SEFGAN: Harvesting The Power Of Normalizing Flows And Gans For Efficient High-quality Speech Enhancement
2023 Β· Martin Strauss, Nicola Pia, Nagashree K. S. Rao, et al.
Abstract
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned on noisy speech using a Normalizing Flow (NF) as generator in a GAN framework. While the combination of likelihood models and GANs is not trivial, SEFGAN demonstrates that a hybrid adversarial and maximum likelihood training approach enables the model to maintain high quality audio generation and log-likelihood estimation. Our experiments indicate that this approach strongly outperforms the baseline NF-based model without introducing additional complexity to the enhancement network. A comparison using computational metrics and a listening experiment reveals that SEFGAN is competitive with other state-of-the-art models.
Authors
(none)
Tags
Stats
Related papers
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Improved Normalizing Flow-based Speech Enhancement Using An All-pole Gammatone Filterbank For Conditional Input Representation (2022)0.00
- FNSE-SBGAN: Far-field Speech Enhancement With Schrodinger Bridge And Generative Adversarial Networks (2025)3.58
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network For Speech Enhancement (2020)0.00
- VSEGAN: Visual Speech Enhancement Generative Adversarial Network (2021)8.60
- Dynamic Attention Based Generative Adversarial Network With Phase Post-processing For Speech Enhancement (2020)0.00
- On The Use Of Audio Fingerprinting Features For Speech Enhancement With Generative Adversarial Network (2020)0.00