Multi-spectrogan: High-diversity And High-fidelity Spectrogram Generation With Adversarial Style Combination For Speech Synthesis
2020 Β· Sang-Hoon Lee, Hyun-Wook Yoon, Hyeong-Rae Noh, et al.
Abstract
While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in neural speech synthesis, there is no TTS system to learn to synthesize speech from text sequences with only adversarial feedback. Because adversarial feedback alone is not sufficient to train the generator, current models still require the reconstruction loss compared with the ground-truth and the generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can train the multi-speaker model with only the adversarial feedback by conditioning a self-supervised hidden representation of the generator to a conditional discriminator. This leads to better guidance for generator training. Moreover, we also propose adversarial style combination (ASC) for better generalization in the unseen speaking style and transcript, which can learn latent representations of the combined style embedding from multiple mel-spectrograms. Trained with ASC and
Authors
(none)
Tags
Stats
Related papers
- Ganspeech: Adversarial Training For High-fidelity Multi-speaker Speech Synthesis (2021)10.07
- Multi-task Adversarial Training Algorithm For Multi-speaker Neural Text-to-speech (2022)0.00
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- A Multi-scale Time-frequency Spectrogram Discriminator For Gan-based Non-autoregressive TTS (2022)6.77
- Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework (2017)10.21
- Expediting TTS Synthesis With Adversarial Vocoding (2019)6.77
- Specdiff-gan: A Spectrally-shaped Noise Diffusion GAN For Speech And Music Synthesis (2024)7.81
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00