Cross-speaker Emotion Transfer Based On Speaker Condition Layer Normalization And Semi-supervised Training In Text-to-speech
2021 Β· Pengfei Wu, Junjie Pan, Chenchang Xu, et al.
Abstract
In expressive speech synthesis, there are high requirements for emotion interpretation. However, it is time-consuming to acquire emotional audio corpus for arbitrary speakers due to their deduction ability. In response to this problem, this paper proposes a cross-speaker emotion transfer method that can realize the transfer of emotions from source speaker to target speaker. A set of emotion tokens is firstly defined to represent various categories of emotions. They are trained to be highly correlated with corresponding emotions for controllable synthesis by cross-entropy loss and semi-supervised training strategy. Meanwhile, to eliminate the down-gradation to the timbre similarity from cross-speaker emotion transfer, speaker condition layer normalization is implemented to model speaker characteristics. Experimental results show that the proposed method outperforms the multi-reference based baseline in terms of timbre similarity, stability and emotion perceive evaluations.
Authors
(none)
Tags
Stats
Related papers
- Cross-speaker Emotion Disentangling And Transfer For End-to-end Speech Synthesis (2021)12.61
- Iemotts: Toward Robust Cross-speaker Emotion Transfer And Control For Speech Synthesis Based On Disentanglement Between Prosody And Timbre (2022)0.00
- Cross-speaker Emotion Transfer For Low-resource Text-to-speech Using Non-parallel Voice Conversion With Pitch-shift Data Augmentation (2022)8.09
- Boosting Multi-speaker Expressive Speech Synthesis With Semi-supervised Contrastive Learning (2023)5.24
- Text-driven Emotional Style Control And Cross-speaker Style Transfer In Neural TTS (2022)7.81
- Multi-speaker Expressive Speech Synthesis Via Multiple Factors Decoupling (2022)0.00
- Fine-grained Emotion Strength Transfer, Control And Prediction For Emotional Speech Synthesis (2020)12.25
- METTS: Multilingual Emotional Text-to-speech By Cross-speaker And Cross-lingual Emotion Transfer (2023)0.00