Emomix: Emotion Mixing Via Diffusion Models For Emotional Speech Synthesis
2023 Β· Haobin Tang, Xulong Zhang, Jianzong Wang, et al.
Abstract
There has been significant progress in emotional Text-To-Speech (TTS) synthesis technology in recent years. However, existing methods primarily focus on the synthesis of a limited number of emotion types and have achieved unsatisfactory performance in intensity control. To address these limitations, we propose EmoMix, which can generate emotional speech with specified intensity or a mixture of emotions. Specifically, EmoMix is a controllable emotional TTS model based on a diffusion probabilistic model and a pre-trained speech emotion recognition (SER) model used to extract emotion embedding. Mixed emotion synthesis is achieved by combining the noises predicted by diffusion model conditioned on different emotions during only one sampling process at the run-time. We further apply the Neutral and specific primary emotion mixed in varying degrees to control intensity. Experimental results validate the effectiveness of EmoMix for synthesizing mixed emotion and intensity control.
Authors
(none)
Tags
Stats
Related papers
- Emodiff: Intensity Controllable Emotional Text-to-speech With Soft-label Guidance (2022)0.00
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00
- Emo-dpo: Controllable Emotional Speech Synthesis Through Direct Preference Optimization (2024)9.59
- Emosphere-tts: Emotional Style And Intensity Modeling Via Spherical Emotion Vector For Controllable Emotional Text-to-speech (2024)10.35
- Fine-grained Emotional Control Of Text-to-speech: Learning To Rank Inter- And Intra-class Emotion Intensities (2023)6.77
- Emotion-aligned Generation In Diffusion Text To Speech Models Via Preference-guided Optimization (2025)2.26
- EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion For Non-parallel And In-the-wild Data (2023)5.84