Converting Anyone's Emotion: Towards Speaker-independent Emotional Voice Conversion
2020 Β· Kun Zhou, Berrak Sisman, Mingyang Zhang, et al.
Abstract
Emotional voice conversion aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. The prior studies on emotional voice conversion are mostly carried out under the assumption that emotion is speaker-dependent. We consider that there is a common code between speakers for emotional expression in a spoken language, therefore, a speaker-independent mapping between emotional states is possible. In this paper, we propose a speaker-independent emotional voice conversion framework, that can convert anyone's emotion without the need for parallel data. We propose a VAW-GAN based encoder-decoder structure to learn the spectrum and prosody mapping. We perform prosody conversion by using continuous wavelet transform (CWT) to model the temporal dependencies. We also investigate the use of F0 as an additional input to the decoder to improve emotion conversion performance. Experiments show that the proposed speaker-independent frame
Authors
(none)
Tags
Stats
Related papers
- VAW-GAN For Disentanglement And Recomposition Of Emotional Elements In Speech (2020)10.74
- Expressive Voice Conversion: A Joint Framework For Speaker Identity And Emotional Style Transfer (2021)9.03
- Transforming Spectrum And Prosody For Emotional Voice Conversion With Non-parallel Training Data (2020)12.54
- Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset (2020)16.34
- Nonparallel Emotional Speech Conversion (2018)11.08
- In-the-wild Speech Emotion Conversion Using Disentangled Self-supervised Representations And Neural Vocoder-based Resynthesis (2023)0.00
- A Diffeomorphic Flow-based Variational Framework For Multi-speaker Emotion Conversion (2022)2.26
- Decoupling Speaker-independent Emotions For Voice Conversion Via Source-filter Networks (2021)9.41