Glow-wavegan 2: High-quality Zero-shot Text-to-speech Synthesis And Any-to-any Voice Conversion
2022 Β· Yi Lei, Shan Yang, Jian Cong, et al.
Abstract
The zero-shot scenario for speech generation aims at synthesizing a novel unseen voice with only one utterance of the target speaker. Although the challenges of adapting new voices in zero-shot scenario exist in both stages -- acoustic modeling and vocoder, previous works usually consider the problem from only one stage. In this paper, we extend our previous Glow-WaveGAN to Glow-WaveGAN 2, aiming to solve the problem from both stages for high-quality zero-shot text-to-speech and any-to-any voice conversion. We first build a universal WaveGAN model for extracting latent distribution \(p(z)\) of speech and reconstructing waveform from it. Then a flow-based acoustic model only needs to learn the same \(p(z)\) from texts, which naturally avoids the mismatch between the acoustic model and the vocoder, resulting in high-quality generated speech without model fine-tuning. Based on a continuous speaker space and the reversible property of flows, the conditional distribution can be obtained for
Authors
(none)
Tags
Stats
Related papers
- Glow-wavegan: Learning Speech Representations From Gan-based Variational Auto-encoder For High Fidelity Flow-based Speech Synthesis (2021)8.35
- Stargan-zsvc: Towards Zero-shot Voice Conversion In Low-resource Contexts (2021)3.58
- Waveglow: A Flow-based Generative Network For Speech Synthesis (2018)20.65
- GAZEV: Gan-based Zero-shot Voice Conversion Over Non-parallel Speech Corpus (2020)8.60
- Learning Noise-independent Speech Representation For High-quality Voice Conversion For Noisy Target Speakers (2022)3.58
- SLMGAN: Exploiting Speech Language Model Representations For Unsupervised Zero-shot Voice Conversion In Gans (2023)0.00
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Squeezewave: Extremely Lightweight Vocoders For On-device Speech Synthesis (2020)4.81