Voice Conversion With Diverse Intonation Using Conditional Variational Auto-encoder
2025 Β· Soobin Suh, Dabi Ahn, Heewoong Park, et al.
Abstract
Voice conversion is a task of synthesizing an utterance with target speaker's voice while maintaining linguistic information of the source utterance. While a speaker can produce varying utterances from a single script with different intonations, conventional voice conversion models were limited to producing only one result per source input. To overcome this limitation, we propose a novel approach for voice conversion with diverse intonations using conditional variational autoencoder (CVAE). Experiments have shown that the speaker's style feature can be mapped into a latent space with Gaussian distribution. We have also been able to convert voices with more diverse intonation by making the posterior of the latent space more complex with inverse autoregressive flow (IAF). As a result, the converted voice not only has a diversity of intonations, but also has better sound quality than the model without CVAE.
Authors
(none)
Tags
Stats
Related papers
- F0-consistent Many-to-many Non-parallel Voice Conversion Via Conditional Autoencoder (2020)13.17
- Voice Conversion With Conditional Samplernn (2018)7.50
- Conditional Deep Hierarchical Variational Autoencoder For Voice Conversion (2021)0.00
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Accented Text-to-speech Synthesis With A Conditional Variational Autoencoder (2022)0.00
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- Many-to-many Voice Conversion Using Cycle-consistent Variational Autoencoder With Multiple Decoders (2019)6.34