Accent-vits:accent Transfer For End-to-end TTS
2023 Β· Linhan Ma, Yongmao Zhang, Xinfa Zhu, et al.
Abstract
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VITS-based end-to-end accent transfer model named Accent-VITS.Based on the main structure of VITS, Accent-VITS makes substantial improvements to enable effective and stable accent transfer.We leverage a hierarchical CVAE structure to model accent pronunciation information and acoustic features, respectively, using bottleneck features and mel spectrums as constraints.Moreover, the text-to-wave mapping in VITS is decomposed into text-to-accent and accent-to-wave mappings in Accent-VITS. In this way, the disentanglement of accent and speaker timbre becomes be more stable and effective.Experiments on multi-accent and Mandarin datasets show that Accent-VITS achieves higher speaker similarity, accent similarity and speech naturalness as compared wi
Authors
(none)
Tags
Stats
Related papers
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Training Text-to-speech Systems From Synthetic Data: A Practical Approach For Accent Transfer Tasks (2022)7.16
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- DART: Disentanglement Of Accent And Speaker Representation In Multispeaker Text-to-speech (2024)0.00
- Transfer The Linguistic Representations From TTS To Accent Conversion With Non-parallel Data (2024)6.77
- VANI: Very-lightweight Accent-controllable TTS For Native And Non-native Speakers With Identity Preservation (2023)3.58
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00