CMGAN: Conformer-based Metric-gan For Monaural Speech Enhancement
2022 Β· Sherif Abdulatif, Ruizhe Cao, Bin Yang
Abstract
In this work, we further develop the conformer-based metric generative adversarial network (CMGAN) model for speech enhancement (SE) in the time-frequency (TF) domain. This paper builds on our previous work but takes a more in-depth look by conducting extensive ablation studies on model inputs and architectural design choices. We rigorously tested the generalization ability of the model to unseen noise types and distortions. We have fortified our claims through DNS-MOS measurements and listening tests. Rather than focusing exclusively on the speech denoising task, we extend this work to address the dereverberation and super-resolution tasks. This necessitated exploring various architectural changes, specifically metric discriminator scores and masking techniques. It is essential to highlight that this is among the earliest works that attempted complex TF-domain super-resolution. Our findings show that CMGAN outperforms existing state-of-the-art methods in the three major speech enhance
Authors
(none)
Tags
Stats
Related papers
- CMGAN: Conformer-based Metric GAN For Speech Enhancement (2022)15.13
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Metricgan-u: Unsupervised Speech Enhancement/ Dereverberation Based Only On Noisy/ Reverberated Speech (2021)11.67
- Metricgan: Generative Adversarial Networks Based Black-box Metric Scores Optimization For Speech Enhancement (2019)0.00
- Multi-metric Optimization Using Generative Adversarial Networks For Near-end Speech Intelligibility Enhancement (2021)8.60
- Imetricgan: Intelligibility Enhancement For Speech-in-noise Using Generative Adversarial Network-based Metric Learning (2020)9.41
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network For Speech Enhancement (2020)0.00