Voice Conversion Based On Cross-domain Features Using Variational Auto Encoders
2018 Β· Wen-Chin Huang, Hsin-Te Hwang, Yu-Huai Peng, et al.
Abstract
An effective approach to non-parallel voice conversion (VC) is to utilize deep neural networks (DNNs), specifically variational auto encoders (VAEs), to model the latent structure of speech in an unsupervised manner. A previous study has confirmed the ef- fectiveness of VAE using the STRAIGHT spectra for VC. How- ever, VAE using other types of spectral features such as mel- cepstral coefficients (MCCs), which are related to human per- ception and have been widely used in VC, have not been prop- erly investigated. Instead of using one specific type of spectral feature, it is expected that VAE may benefit from using multi- ple types of spectral features simultaneously, thereby improving the capability of VAE for VC. To this end, we propose a novel VAE framework (called cross-domain VAE, CDVAE) for VC. Specifically, the proposed framework utilizes both STRAIGHT spectra and MCCs by explicitly regularizing multiple objectives in order to constrain the behavior of the learned encoder and de-
Authors
(none)
Tags
Stats
Related papers
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Conditional Deep Hierarchical Variational Autoencoder For Voice Conversion (2021)0.00
- Voice Conversion From Non-parallel Corpora Using Variational Auto-encoder (2016)16.36
- Investigation Of F0 Conditioning And Fully Convolutional Networks In Variational Autoencoder Based Voice Conversion (2019)0.00
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Many-to-many Voice Conversion Based Feature Disentanglement Using Variational Autoencoder (2021)7.81