Non-parallel Emotion Conversion Using A Deep-generative Hybrid Network And An Adversarial Pair Discriminator
2020 Β· Ravi Shankar, Jacob Sager, Archana Venkataraman
Abstract
We introduce a novel method for emotion conversion in speech that does not require parallel training data. Our approach loosely relies on a cycle-GAN schema to minimize the reconstruction error from converting back and forth between emotion pairs. However, unlike the conventional cycle-GAN, our discriminator classifies whether a pair of input real and generated samples corresponds to the desired emotion conversion (e.g., A to B) or to its inverse (B to A). We will show that this setup, which we refer to as a variational cycle-GAN (VC-GAN), is equivalent to minimizing the empirical KL divergence between the source features and their cyclic counterpart. In addition, our generator combines a trainable deep network with a fixed generative block to implement a smooth and invertible transformation on the input features, in our case, the fundamental frequency (F0) contour. This hybrid architecture regularizes our adversarial training procedure. We use crowd sourcing to evaluate both the emoti
Authors
(none)
Tags
Stats
Related papers
- A Diffeomorphic Flow-based Variational Framework For Multi-speaker Emotion Conversion (2022)2.26
- Nonparallel Emotional Voice Conversion For Unseen Speaker-emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing (2023)0.00
- Nonparallel Emotional Speech Conversion (2018)11.08
- Transforming Spectrum And Prosody For Emotional Voice Conversion With Non-parallel Training Data (2020)12.54
- In-the-wild Speech Emotion Conversion Using Disentangled Self-supervised Representations And Neural Vocoder-based Resynthesis (2023)0.00
- Multi-speaker Emotion Conversion Via Latent Variable Regularization And A Chained Encoder-decoder-predictor Network (2020)5.84
- Stargan-vc++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings (2023)2.26
- Converting Anyone's Emotion: Towards Speaker-independent Emotional Voice Conversion (2020)11.39