Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis
2022 Β· Yi Lei, Shan Yang, Xinsheng Wang, et al.
Abstract
Expressive synthetic speech is essential for many human-computer interaction and audio broadcast scenarios, and thus synthesizing expressive speech has attracted much attention in recent years. Previous methods performed the expressive speech synthesis either with explicit labels or with a fixed-length style embedding extracted from reference audio, both of which can only learn an average style and thus ignores the multi-scale nature of speech prosody. In this paper, we propose MsEmoTTS, a multi-scale emotional speech synthesis framework, to model the emotion from different levels. Specifically, the proposed method is a typical attention-based sequence-to-sequence model and with proposed three modules, including global-level emotion presenting module (GM), utterance-level emotion presenting module (UM), and local-level emotion presenting module (LM), to model the global emotion category, utterance-level emotion variation, and syllable-level emotion strength, respectively. In addition t
Authors
(none)
Tags
Stats
Related papers
- Fine-grained Emotion Strength Transfer, Control And Prediction For Emotional Speech Synthesis (2020)12.25
- Emomix: Emotion Mixing Via Diffusion Models For Emotional Speech Synthesis (2023)0.00
- Iemotts: Toward Robust Cross-speaker Emotion Transfer And Control For Speech Synthesis Based On Disentanglement Between Prosody And Timbre (2022)0.00
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00
- UMETTS: A Unified Framework For Emotional Text-to-speech Synthesis With Multimodal Prompts (2024)5.24
- Emosphere-tts: Emotional Style And Intensity Modeling Via Spherical Emotion Vector For Controllable Emotional Text-to-speech (2024)10.35
- A Methodology For Controlling The Emotional Expressiveness In Synthetic Speech -- A Deep Learning Approach (2019)5.84
- Boosting Multi-speaker Expressive Speech Synthesis With Semi-supervised Contrastive Learning (2023)5.24