Emoreg: Directional Latent Vector Modeling For Emotional Intensity Regularization In Diffusion-based Voice Conversion
2024 Β· Ashishkumar Gudmalwar, Ishan D. Biyani, Nirmesh Shah, et al.
Abstract
The Emotional Voice Conversion (EVC) aims to convert the discrete emotional state from the source emotion to the target for a given speech utterance while preserving linguistic content. In this paper, we propose regularizing emotion intensity in the diffusion-based EVC framework to generate precise speech of the target emotion. Traditional approaches control the intensity of an emotional state in the utterance via emotion class probabilities or intensity labels that often lead to inept style manipulations and degradations in quality. On the contrary, we aim to regulate emotion intensity using self-supervised learning-based feature representations and unsupervised directional latent vector modeling (DVM) in the emotional embedding space within a diffusion-based framework. These emotion embeddings can be modified based on the given target emotion intensity and the corresponding direction vector. Furthermore, the updated embeddings can be fused in the reverse diffusion process to generate
Authors
(none)
Tags
Stats
Related papers
- Towards Realistic Emotional Voice Conversion Using Controllable Emotional Intensity (2024)5.84
- Mixed-evc: Mixed Emotion Synthesis And Control In Voice Conversion (2022)4.52
- Converting Anyone's Voice: End-to-end Expressive Voice Conversion With A Conditional Diffusion Model (2024)5.24
- ZSDEVC: Zero-shot Diffusion-based Emotional Voice Conversion With Disentangled Mechanism (2024)0.00
- An Overview & Analysis Of Sequence-to-sequence Emotional Voice Conversion (2022)8.60
- Limited Data Emotional Voice Conversion Leveraging Text-to-speech: Two-stage Sequence-to-sequence Training (2021)10.35
- Nonparallel Emotional Voice Conversion For Unseen Speaker-emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing (2023)0.00
- EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion For Non-parallel And In-the-wild Data (2023)5.84