Multi-task Wavenet: A Multi-task Generative Model For Statistical Parametric Speech Synthesis Without Fundamental Frequency Conditions
2018 Β· Yu Gu, Yongguo Kang
Abstract
This paper introduces an improved generative model for statistical parametric speech synthesis (SPSS) based on WaveNet under a multi-task learning framework. Different from the original WaveNet model, the proposed Multi-task WaveNet employs the frame-level acoustic feature prediction as the secondary task and the external fundamental frequency prediction model for the original WaveNet can be removed. Therefore the improved WaveNet can generate high-quality speech waveforms only conditioned on linguistic features. Multi-task WaveNet can produce more natural and expressive speech by addressing the pitch prediction error accumulation issue and possesses more succinct inference procedures than the original WaveNet. Experimental results prove that the SPSS method proposed in this paper can achieve better performance than the state-of-the-art approach utilizing the original WaveNet in both objective and subjective preference tests.
Authors
(none)
Tags
Stats
Related papers
- Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework (2017)10.21
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Wavenet: A Generative Model For Raw Audio (2016)0.00
- Neural Source-filter-based Waveform Model For Statistical Parametric Speech Synthesis (2018)13.97
- Wasserstein GAN And Waveform Loss-based Acoustic Model Training For Multi-speaker Text-to-speech Synthesis Systems Using A Wavenet Vocoder (2018)12.61
- Parallel Wavenet: Fast High-fidelity Speech Synthesis (2017)0.00
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Excitnet Vocoder: A Neural Excitation Model For Parametric Speech Synthesis Systems (2018)9.76