Enhancing Polyglot Voices By Leveraging Cross-lingual Fine-tuning In Any-to-one Voice Conversion
2024 Β· Giuseppe Ruggiero, Matteo Testa, Jurgen van de Walle, et al.
Abstract
The creation of artificial polyglot voices remains a challenging task, despite considerable progress in recent years. This paper investigates self-supervised learning for voice conversion to create native-sounding polyglot voices. We introduce a novel cross-lingual any-to-one voice conversion system that is able to preserve the source accent without the need for multilingual data from the target speaker. In addition, we show a novel cross-lingual fine-tuning strategy that further improves the accent and reduces the training data requirements. Objective and subjective evaluations with English, Spanish, French and Mandarin Chinese confirm that our approach improves on state-of-the-art methods, enhancing the speech intelligibility and overall quality of the converted speech, especially in cross-lingual scenarios. Audio samples are available at https://giuseppe-ruggiero.github.io/a2o-vc-demo/
Authors
(none)
Tags
Stats
Related papers
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00
- Cross-lingual Knowledge Distillation Via Flow-based Voice Conversion For Robust Polyglot Text-to-speech (2023)0.00
- Building Bilingual And Code-switched Voice Conversion With Limited Training Data Using Embedding Consistency Loss (2021)0.00
- Towards Natural And Controllable Cross-lingual Voice Conversion Based On Neural TTS Model And Phonetic Posteriorgram (2021)0.00
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Any-to-one Sequence-to-sequence Voice Conversion Using Self-supervised Discrete Speech Representations (2020)0.00
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- Cross-lingual Text-to-speech With Flow-based Voice Conversion For Improved Pronunciation (2022)0.00