Decoupling Speaker-independent Emotions For Voice Conversion Via Source-filter Networks
2021 Β· Zhaojie Luo, Shoufeng Lin, Rui Liu, et al.
Abstract
Emotional voice conversion (VC) aims to convert a neutral voice to an emotional (e.g. happy) one while retaining the linguistic information and speaker identity. We note that the decoupling of emotional features from other speech information (such as speaker, content, etc.) is the key to achieving remarkable performance. Some recent attempts about speech representation decoupling on the neutral speech can not work well on the emotional speech, due to the more complex acoustic properties involved in the latter. To address this problem, here we propose a novel Source-Filter-based Emotional VC model (SFEVC) to achieve proper filtering of speaker-independent emotion features from both the timbre and pitch features. Our SFEVC model consists of multi-channel encoders, emotion separate encoders, and one decoder. Note that all encoder modules adopt a designed information bottlenecks auto-encoder. Additionally, to further improve the conversion quality for various emotions, a novel two-stage tr
Authors
(none)
Tags
Stats
Related papers
- Converting Anyone's Emotion: Towards Speaker-independent Emotional Voice Conversion (2020)11.39
- Mixed-evc: Mixed Emotion Synthesis And Control In Voice Conversion (2022)4.52
- Expressive Voice Conversion: A Joint Framework For Speaker Identity And Emotional Style Transfer (2021)9.03
- An Overview & Analysis Of Sequence-to-sequence Emotional Voice Conversion (2022)8.60
- Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset (2020)16.34
- Stargan-vc++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings (2023)2.26
- Converting Anyone's Voice: End-to-end Expressive Voice Conversion With A Conditional Diffusion Model (2024)5.24
- Limited Data Emotional Voice Conversion Leveraging Text-to-speech: Two-stage Sequence-to-sequence Training (2021)10.35