Accented Text-to-speech Synthesis With A Conditional Variational Autoencoder
2022 Β· Jan Melechovsky, Ambuj Mehrish, Berrak Sisman, et al.
Abstract
Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, and convert this to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the model's ability to manipulate accents in the synthesized speech. Overall, our proposed framework presents a promising avenue for future accented TTS research.
Authors
(none)
Tags
Stats
Related papers
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Voice Conversion With Diverse Intonation Using Conditional Variational Auto-encoder (2025)0.00
- DART: Disentanglement Of Accent And Speaker Representation In Multispeaker Text-to-speech (2024)0.00
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- Audio-visual Speech Enhancement Using Conditional Variational Auto-encoders (2019)13.65
- Text-to-speech Synthesis Based On Latent Variable Conversion Using Diffusion Probabilistic Model And Variational Autoencoder (2022)0.00
- Expressive Speech Synthesis Via Modeling Expressions With Variational Autoencoder (2018)13.88
- Hierarchical Generative Modeling For Controllable Speech Synthesis (2018)0.00