Spectrum And Prosody Conversion For Cross-lingual Voice Conversion With Cyclegan
2020 Β· Zongyang Du, Kun Zhou, Berrak Sisman, et al.
Abstract
Cross-lingual voice conversion aims to change source speaker's voice to sound like that of target speaker, when source and target speakers speak different languages. It relies on non-parallel training data from two different languages, hence, is more challenging than mono-lingual voice conversion. Previous studies on cross-lingual voice conversion mainly focus on spectral conversion with a linear transformation for F0 transfer. However, as an important prosodic factor, F0 is inherently hierarchical, thus it is insufficient to just use a linear method for conversion. We propose the use of continuous wavelet transform (CWT) decomposition for F0 modeling. CWT provides a way to decompose a signal into different temporal scales that explain prosody in different time resolutions. We also propose to train two CycleGAN pipelines for spectrum and prosody mapping respectively. In this way, we eliminate the need for parallel data of any two languages and any alignment techniques. Experimental res
Authors
(none)
Tags
Stats
Related papers
- Transforming Spectrum And Prosody For Emotional Voice Conversion With Non-parallel Training Data (2020)12.54
- Cyclegan-vc3: Examining And Improving Cyclegan-vcs For Mel-spectrogram Conversion (2020)14.02
- Cyclegan Voice Conversion Of Spectral Envelopes Using Adversarial Weights (2019)6.77
- Cinc-gan For Effective F0 Prediction For Whisper-to-normal Speech Conversion (2020)5.84
- Baseline System Of Voice Conversion Challenge 2020 With Cyclic Variational Autoencoder And Parallel Wavegan (2020)4.24
- Cyclegan-vc2: Improved Cyclegan-based Non-parallel Voice Conversion (2019)17.45
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- Autocycle-vc: Towards Bottleneck-independent Zero-shot Cross-lingual Voice Conversion (2023)0.00