Nonparallel Emotional Speech Conversion
2018 Β· Jian Gao, Deep Chakraborty, Hamidou Tembine, et al.
Abstract
We propose a nonparallel data-driven emotional speech conversion method. It enables the transfer of emotion-related characteristics of a speech signal while preserving the speaker's identity and linguistic content. Most existing approaches require parallel data and time alignment, which is not available in most real applications. We achieve nonparallel training based on an unsupervised style transfer technique, which learns a translation model between two distributions instead of a deterministic one-to-one mapping between paired examples. The conversion model consists of an encoder and a decoder for each emotion domain. We assume that the speech signal can be decomposed into an emotion-invariant content code and an emotion-related style code in latent space. Emotion conversion is performed by extracting and recombining the content code of the source speech and the style code of the target emotion. We tested our method on a nonparallel corpora with four emotions. Both subjective and obj
Authors
(none)
Tags
Stats
Related papers
- Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset (2020)16.34
- Textless Speech Emotion Conversion Using Discrete And Decomposed Representations (2021)10.74
- Converting Anyone's Emotion: Towards Speaker-independent Emotional Voice Conversion (2020)11.39
- Non-parallel Emotion Conversion Using A Deep-generative Hybrid Network And An Adversarial Pair Discriminator (2020)6.77
- EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion For Non-parallel And In-the-wild Data (2023)5.84
- Expressive Voice Conversion: A Joint Framework For Speaker Identity And Emotional Style Transfer (2021)9.03
- Limited Data Emotional Voice Conversion Leveraging Text-to-speech: Two-stage Sequence-to-sequence Training (2021)10.35
- Improving Speech Emotion Recognition With Unsupervised Speaking Style Transfer (2022)6.34