Hifi-gan: High-fidelity Denoising And Dereverberation Based On Speech Deep Features In Adversarial Networks
2020 Β· Jiaqi Su, Zeyu Jin, Adam Finkelstein
Abstract
Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.
Authors
(none)
Tags
Stats
Related papers
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- Hifi++: A Unified Framework For Bandwidth Extension And Speech Enhancement (2022)11.93
- Hifi-sr: A Unified Generative Transformer-convolutional Adversarial Network For High-fidelity Speech Super-resolution (2025)10.81
- Hifi-wavegan: Generative Adversarial Network With Auxiliary Spectrogram-phase Loss For High-fidelity Singing Voice Generation (2022)0.00
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Specdiff-gan: A Spectrally-shaped Noise Diffusion GAN For Speech And Music Synthesis (2024)7.81