Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks
2017 Β· Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari
Abstract
A method for statistical parametric speech synthesis incorporating generative adversarial networks (GANs) is proposed. Although powerful deep neural networks (DNNs) techniques can be applied to artificially synthesize speech waveform, the synthetic speech quality is low compared with that of natural speech. One of the issues causing the quality degradation is an over-smoothing effect often observed in the generated speech parameters. A GAN introduced in this paper consists of two neural networks: a discriminator to distinguish natural and generated samples, and a generator to deceive the discriminator. In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator. Since the objective of the GANs is to minimize the divergence (i.e., distribution diffe
Authors
(none)
Tags
Stats
Related papers
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Reducing Over-smoothness In Speech Synthesis Using Generative Adversarial Networks (2018)5.24
- Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework (2017)10.21
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- End-to-end Video-to-speech Synthesis Using Generative Adversarial Networks (2021)11.58
- Video-driven Speech Reconstruction Using Generative Adversarial Networks (2019)11.39
- Ganspeech: Adversarial Training For High-fidelity Multi-speaker Speech Synthesis (2021)10.07