Fine-grained Emotion Strength Transfer, Control And Prediction For Emotional Speech Synthesis
2020 Β· Yi Lei, Shan Yang, Lei Xie
Abstract
This paper proposes a unified model to conduct emotion transfer, control and prediction for sequence-to-sequence based fine-grained emotional speech synthesis. Conventional emotional speech synthesis often needs manual labels or reference audio to determine the emotional expressions of synthesized speech. Such coarse labels cannot control the details of speech emotion, often resulting in an averaged emotion expression delivery, and it is also hard to choose suitable reference audio during inference. To conduct fine-grained emotion expression generation, we introduce phoneme-level emotion strength representations through a learned ranking function to describe the local emotion details, and the sentence-level emotion category is adopted to render the global emotions of synthesized speech. With the global render and local descriptors of emotions, we can obtain fine-grained emotion expressions from reference audio via its emotion descriptors (for transfer) or directly from phoneme-level ma
Authors
(none)
Tags
Stats
Related papers
- Msemotts: Multi-scale Emotion Transfer, Prediction, And Control For Emotional Speech Synthesis (2022)13.97
- Strengthnet: Deep Learning-based Emotion Strength Assessment For Emotional Speech Synthesis (2021)3.85
- Emotional Speech Synthesis With Rich And Granularized Control (2019)13.39
- Fine-grained Emotional Control Of Text-to-speech: Learning To Rank Inter- And Intra-class Emotion Intensities (2023)6.77
- Accurate Emotion Strength Assessment For Seen And Unseen Speech Based On Data-driven Deep Learning (2022)8.36
- Iemotts: Toward Robust Cross-speaker Emotion Transfer And Control For Speech Synthesis Based On Disentanglement Between Prosody And Timbre (2022)0.00
- Semi-supervised Learning For Continuous Emotional Intensity Controllable Speech Synthesis With Disentangled Representations (2022)0.00
- Improving Speech Emotion Recognition With Unsupervised Speaking Style Transfer (2022)6.34