MFCCGAN: A Novel Mfcc-based Speech Synthesizer Using Adversarial Learning
2023 Β· Mohammad Reza Hasanabadi Majid Behdad Davood Gharavian
Abstract
In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality more. WebMUSHRA-based subjective tests also show the quality of the proposed approach.
Authors
(none)
Tags
Stats
Related papers
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks (2018)12.25
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)16.21
- Ganspeech: Adversarial Training For High-fidelity Multi-speaker Speech Synthesis (2021)10.07
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54