Conditional Deep Hierarchical Variational Autoencoder For Voice Conversion
2021 Β· Kei Akuzawa, Kotaro Onishi, Keisuke Takiguchi, et al.
Abstract
Variational autoencoder-based voice conversion (VAE-VC) has the advantage of requiring only pairs of speeches and speaker labels for training. Unlike the majority of the research in VAE-VC which focuses on utilizing auxiliary losses or discretizing latent variables, this paper investigates how an increasing model expressiveness has benefits and impacts on the VAE-VC. Specifically, we first analyze VAE-VC from a rate-distortion perspective, and point out that model expressiveness is significant for VAE-VC because rate and distortion reflect similarity and naturalness of converted speeches. Based on the analysis, we propose a novel VC method using a deep hierarchical VAE, which has high model expressiveness as well as having fast conversion speed thanks to its non-autoregressive decoder. Also, our analysis reveals another problem that similarity can be degraded when the latent variable of VAEs has redundant information. We address the problem by controlling the information contained in t
Authors
(none)
Tags
Stats
Related papers
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Many-to-many Voice Conversion Based Feature Disentanglement Using Variational Autoencoder (2021)7.81
- Many-to-many Voice Conversion Using Cycle-consistent Variational Autoencoder With Multiple Decoders (2019)6.34
- Investigation Of F0 Conditioning And Fully Convolutional Networks In Variational Autoencoder Based Voice Conversion (2019)0.00
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- Voice Conversion With Diverse Intonation Using Conditional Variational Auto-encoder (2025)0.00
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice Conversion (2022)10.97
- Voice Conversion Based On Cross-domain Features Using Variational Auto Encoders (2018)11.29