Tts-guided Training For Accent Conversion Without Parallel Data
2022 Β· Yi Zhou, Zhizheng Wu, Mingyang Zhang, et al.
Abstract
Accent Conversion (AC) seeks to change the accent of speech from one (source) to another (target) while preserving the speech content and speaker identity. However, many AC approaches rely on source-target parallel speech data. We propose a novel accent conversion framework without the need of parallel data. Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data. This TTS model and its hidden representations are expected to be associated only with the target accent. Then, a speech encoder is trained to convert the accent of the speech under the supervision of the pretrained TTS model. In doing so, the source-accented speech and its corresponding transcription are forwarded to the speech encoder and the pretrained TTS, respectively. The output of the speech encoder is optimized to be the same as the text embedding in the TTS system. At run-time, the speech encoder is combined with the pretrained TTS decoder to convert the source-accented speech
Authors
(none)
Tags
Stats
Related papers
- Transfer The Linguistic Representations From TTS To Accent Conversion With Non-parallel Data (2024)6.77
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Bootstrapping Non-parallel Voice Conversion From Speaker-adaptive Text-to-speech (2019)8.35
- Training Text-to-speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks (2022)7.16
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Transfer Learning From Speech Synthesis To Voice Conversion With Non-parallel Training Data (2020)12.74
- Zero-shot Accent Conversion Using Pseudo Siamese Disentanglement Network (2022)5.24