Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-resolution Spectrogram
2019 Β· Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Abstract
We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillati
Authors
(none)
Tags
Stats
Related papers
- Probability Density Distillation With Generative Adversarial Networks For High-quality Parallel Waveform Generation (2019)10.48
- Parallel Waveform Synthesis Based On Generative Adversarial Networks With Voicing-aware Conditional Discriminators (2020)7.16
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Wave-u-net Discriminator: Fast And Lightweight Discriminator For Generative Adversarial Network-based Speech Synthesis (2023)6.34
- Improved Parallel Wavegan Vocoder With Perceptually Weighted Spectrogram Loss (2021)7.50
- Parallel Wavenet: Fast High-fidelity Speech Synthesis (2017)0.00
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Generative Adversarial Network Based Speaker Adaptation For High Fidelity Wavenet Vocoder (2018)5.84