A Novel Cross-lingual Voice Cloning Approach With A Few Text-free Samples
2019 Β· Xinyong Zhou, Hao Che, Xiaorui Wang, et al.
Abstract
In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target speaker to realize voice cloning. This system can generate speech of arbitrary utterance of target language in cross-lingual speakers' voice. We verify that with small amount of audio data, our proposed approach can well handle cross-lingual tasks. And in intra-lingual tasks, our proposed approach also performs better than baseline approach in naturalness and similarity.
Authors
(none)
Tags
Stats
Related papers
- Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data (2021)0.00
- Latent Linguistic Embedding For Cross-lingual Text-to-speech And Voice Conversion (2020)0.00
- Cross-lingual Multi-speaker Text-to-speech Synthesis For Voice Cloning Without Using Parallel Corpus For Unseen Speakers (2019)0.00
- Neural Voice Cloning With A Few Samples (2018)0.00
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03
- The THU-HCSI Multi-speaker Multi-lingual Few-shot Voice Cloning System For LIMMITS'24 Challenge (2024)0.00