Speech Waveform Synthesis From MFCC Sequences With Generative Adversarial Networks
2018 Β· Lauri Juvela, Bajibabu Bollepalli, Xin Wang, et al.
Abstract
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.
Authors
(none)
Tags
Stats
Related papers
- MFCCGAN: A Novel Mfcc-based Speech Synthesizer Using Adversarial Learning (2023)5.84
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Wavecyclegan: Synthetic-to-natural Speech Waveform Conversion Using Cycle-consistent Adversarial Networks (2018)9.92
- Melgan: Generative Adversarial Networks For Conditional Waveform Synthesis (2019)0.00
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- GELP: Gan-excited Linear Prediction For Speech Synthesis From Mel-spectrogram (2019)10.74
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00