Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech
2020 Β· Wenjie Li, Benlai Tang, Xiang Yin, et al.
Abstract
Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre. In this paper, we propose approaches to improving accent conversion applicability, as well as quality. First of all, we assume no reference speech is available at the conversion stage, and hence we employ an end-to-end text-to-speech system that is trained on native speech to generate native reference speech. To improve the quality and accent of the converted speech, we introduce reference encoders which make us capable of utilizing multi-source information. This is motivated by acoustic features extracted from native reference and linguistic information, which are complementary to conventional phonetic posteriorgrams (PPGs), so they can be concatenated as features to improve a baseline system based only on PPGs. Moreover, we optimize model architecture using GMM-based attention instead of windowed attention to elevate synthesized performance. Experimental
Authors
(none)
Tags
Stats
Related papers
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Transfer The Linguistic Representations From TTS To Accent Conversion With Non-parallel Data (2024)6.77
- Disentangling Segmental And Prosodic Factors To Non-native Speech Comprehensibility (2024)0.00
- Accent And Speaker Disentanglement In Many-to-many Voice Conversion (2020)10.35
- Tts-guided Training For Accent Conversion Without Parallel Data (2022)8.60
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Zero-shot Accent Conversion Using Pseudo Siamese Disentanglement Network (2022)5.24