Zero-shot Accent Conversion Using Pseudo Siamese Disentanglement Network
2022 Β· Dongya Jia, Qiao Tian, Kainan Peng, et al.
Abstract
The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity. AC enables a variety of applications, such as language learning, speech content creation, and data augmentation. Previous methods rely on reference utterances in the inference phase or are unable to preserve speaker identity. To address these issues, we propose a zero-shot reference-free accent conversion method, which is able to convert unseen speakers' utterances into a target accent. Pseudo Siamese Disentanglement Network (PSDN) is proposed to disentangle the accent from the content representation. Experimental results show that our model generates speech samples with much higher accentedness than the input and comparable naturalness, on two-way conversion including foreign-to-native and native-to-foreign.
Authors
(none)
Tags
Stats
Related papers
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Accent And Speaker Disentanglement In Many-to-many Voice Conversion (2020)10.35
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Accentbox: Towards High-fidelity Zero-shot Accent Generation (2024)6.34
- Tts-guided Training For Accent Conversion Without Parallel Data (2022)8.60
- Disentangling Segmental And Prosodic Factors To Non-native Speech Comprehensibility (2024)0.00
- ACE-VC: Adaptive And Controllable Voice Conversion Using Explicitly Disentangled Self-supervised Speech Representations (2023)0.00
- Transfer The Linguistic Representations From TTS To Accent Conversion With Non-parallel Data (2024)6.77