Multi-task Adversarial Training Algorithm For Multi-speaker Neural Text-to-speech
2022 Β· Yusuke Nakai, Yuki Saito, Kenta Udagawa, et al.
Abstract
We propose a novel training algorithm for a multi-speaker neural text-to-speech (TTS) model based on multi-task adversarial training. A conventional generative adversarial network (GAN)-based training algorithm significantly improves the quality of synthetic speech by reducing the statistical difference between natural and synthetic speech. However, the algorithm does not guarantee the generalization performance of the trained TTS model in synthesizing voices of unseen speakers who are not included in the training data. Our algorithm alternatively trains two deep neural networks: multi-task discriminator and multi-speaker neural TTS model (i.e., generator of GANs). The discriminator is trained not only to distinguish between natural and synthetic speech but also to verify the speaker of input speech is existent or non-existent (i.e., newly generated by interpolating seen speakers' embedding vectors). Meanwhile, the generator is trained to minimize the weighted sum of the speech reconst
Authors
(none)
Tags
Stats
Related papers
- Ganspeech: Adversarial Training For High-fidelity Multi-speaker Speech Synthesis (2021)10.07
- Multi-spectrogan: High-diversity And High-fidelity Spectrogram Generation With Adversarial Style Combination For Speech Synthesis (2020)0.00
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework (2017)10.21
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Speaker- And Age-invariant Training For Child Acoustic Modeling Using Adversarial Multi-task Learning (2022)0.00
- Expediting TTS Synthesis With Adversarial Vocoding (2019)6.77
- End-to-end Adversarial Text-to-speech (2020)0.00