Latent Linguistic Embedding For Cross-lingual Text-to-speech And Voice Conversion
2020 Β· Hieu-Thi Luong, Junichi Yamagishi
Abstract
As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system. Cross-lingual speech generation is the scenario in which speech utterances are generated with the voices of target speakers in a language not spoken by them originally. This type of system is not simply cloning the voice of the target speaker, but essentially creating a new voice that can be considered better than the original under a specific framing. By using a well-trained English latent linguistic embedding to create a cross-lingual TTS and VC system for several German, Finnish, and Mandarin speakers included in the Voice Conversion Challenge 2020, we show that our method not only creates cross-lingual VC with high speaker similarity but also can be seamlessly used for cross-lingual TTS without having to perform any extra steps. However, the subjective evaluations of perc
Authors
(none)
Tags
Stats
Related papers
- Cross-lingual Multi-speaker Text-to-speech Synthesis For Voice Cloning Without Using Parallel Corpus For Unseen Speakers (2019)0.00
- A Novel Cross-lingual Voice Cloning Approach With A Few Text-free Samples (2019)0.00
- Learning To Speak Fluently In A Foreign Language: Multilingual Speech Synthesis And Cross-language Voice Cloning (2019)15.03
- Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data (2021)0.00
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00
- The THU-HCSI Multi-speaker Multi-lingual Few-shot Voice Cloning System For LIMMITS'24 Challenge (2024)0.00
- Towards Natural And Controllable Cross-lingual Voice Conversion Based On Neural TTS Model And Phonetic Posteriorgram (2021)0.00
- Cross-lingual Knowledge Distillation Via Flow-based Voice Conversion For Robust Polyglot Text-to-speech (2023)0.00