Building Multi Lingual TTS Using Cross Lingual Voice Conversion
2020 Β· Qinghua Sun, Kenji Nagamatsu
Abstract
In this paper we propose a new cross-lingual Voice Conversion (VC) approach which can generate all speech parameters (MCEP, LF0, BAP) from one DNN model using PPGs (Phonetic PosteriorGrams) extracted from inputted speech using several ASR acoustic models. Using the proposed VC method, we tried three different approaches to build a multilingual TTS system without recording a multilingual speech corpus. A listening test was carried out to evaluate both speech quality (naturalness) and voice similarity between converted speech and target speech. The results show that Approach 1 achieved the highest level of naturalness (3.28 MOS on a 5-point scale) and similarity (2.77 MOS).
Authors
(none)
Tags
Stats
Related papers
- Towards Natural And Controllable Cross-lingual Voice Conversion Based On Neural TTS Model And Phonetic Posteriorgram (2021)0.00
- Building Bilingual And Code-switched Voice Conversion With Limited Training Data Using Embedding Consistency Loss (2021)0.00
- Towards Natural Bilingual And Code-switched Speech Synthesis Based On Mix Of Monolingual Recordings And Cross-lingual Voice Conversion (2020)0.00
- Enhancing Polyglot Voices By Leveraging Cross-lingual Fine-tuning In Any-to-one Voice Conversion (2024)0.00
- AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion (2021)7.50
- Cross-lingual Knowledge Distillation Via Flow-based Voice Conversion For Robust Polyglot Text-to-speech (2023)0.00
- Cross-lingual Text-to-speech With Flow-based Voice Conversion For Improved Pronunciation (2022)0.00
- Transfer Learning From Monolingual ASR To Transcription-free Cross-lingual Voice Conversion (2020)0.00