High-fidelity Speech Synthesis With Minimal Supervision: All Using Diffusion Models
2023 Β· Chunyu Qiang, Hao Li, Yixin Tian, et al.
Abstract
Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(semantic \& acoustic) and using two sequence-to-sequence tasks to enable training with minimal supervision. However, existing methods suffer from information redundancy and dimension explosion in semantic representation, and high-frequency waveform distortion in discrete acoustic representation. Autoregressive frameworks exhibit typical instability and uncontrollability issues. And non-autoregressive frameworks suffer from prosodic averaging caused by duration prediction models. To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models. The non-autoregressive framework enhances controllability, and the duration diffusion model enables diver
Authors
(none)
Tags
Stats
Related papers
- Minimally-supervised Speech Synthesis With Conditional Diffusion Model And Language Model: A Comparative Study Of Semantic Coding (2023)8.82
- Dmospeech: Direct Metric Optimization Via Distilled Diffusion Model In Zero-shot Speech Synthesis (2024)0.00
- Fastdiff: A Fast Conditional Diffusion Model For High-quality Speech Synthesis (2022)14.35
- Naturalspeech 2: Latent Diffusion Models Are Natural And Zero-shot Speech And Singing Synthesizers (2023)0.00
- Multi-gradspeech: Towards Diffusion-based Multi-speaker Text-to-speech Using Consistent Diffusion Models (2023)0.00
- Diffs2ut: A Semantic Preserving Diffusion Model For Textless Direct Speech-to-speech Translation (2023)2.26
- Schrodinger Bridges Beat Diffusion Models On Text-to-speech Synthesis (2023)0.00
- Diffar: Denoising Diffusion Autoregressive Model For Raw Speech Waveform Generation (2023)0.00