Voice Conversion From Non-parallel Corpora Using Variational Auto-encoder
2016 Β· Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, et al.
Abstract
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.
Authors
(none)
Tags
Stats
Related papers
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34
- Voice Conversion Based On Cross-domain Features Using Variational Auto Encoders (2018)11.29
- Non-parallel Voice Conversion With Cyclic Variational Autoencoder (2019)12.10
- Learning In Your Voice: Non-parallel Voice Conversion Based On Speaker Consistency Loss (2020)0.00
- Singing Voice Conversion With Disentangled Representations Of Singer And Vocal Technique Using Variational Autoencoders (2019)10.97
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Semi-supervised Voice Conversion With Amortized Variational Inference (2019)3.58
- Adversarially Trained Autoencoders For Parallel-data-free Voice Conversion (2019)6.34