Multi-scale Accent Modeling And Disentangling For Multi-speaker Multi-accent Text-to-speech Synthesis
2024 Β· Xuehao Zhou, Mingyang Zhang, Yi Zhou, et al.
Abstract
Generating speech across different accents while preserving speaker identity is crucial for various real-world applications. However, accurately and independently modeling both speaker and accent characteristics in text-to-speech (TTS) systems is challenging due to the complex variations of accents and the inherent entanglement between speaker and accent identities. In this paper, we propose a novel approach for multi-speaker multi-accent TTS synthesis that aims to synthesize speech for multiple speakers, each with various accents. Our approach employs a multi-scale accent modeling strategy to address accent variations on different levels. Specifically, we introduce both global (utterance level) and local (phoneme level) accent modeling to capture overall accent characteristics within an utterance and fine-grained accent variations across phonemes, respectively. To enable independent control of speakers and accents, we use the speaker embedding to represent speaker identity and achieve
Authors
(none)
Tags
Stats
Related papers
- DART: Disentanglement Of Accent And Speaker Representation In Multispeaker Text-to-speech (2024)0.00
- Cross-dialect Text-to-speech In Pitch-accent Language Incorporating Multi-dialect Phoneme-level BERT (2024)3.58
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Macst: Multi-accent Speech Synthesis Via Text Transliteration For Accent Conversion (2024)5.24
- Accent And Speaker Disentanglement In Many-to-many Voice Conversion (2020)10.35
- Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios (2021)6.77
- Cross-lingual Multi-speaker Text-to-speech Synthesis For Voice Cloning Without Using Parallel Corpus For Unseen Speakers (2019)0.00
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84