Macst: Multi-accent Speech Synthesis Via Text Transliteration For Accent Conversion
2024 Β· Sho Inoue, Shuai Wang, Wanxing Wang, et al.
Abstract
In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies.
Authors
(none)
Tags
Stats
Related papers
- Accent Conversion Using Discrete Units With Parallel Data Synthesized From Controllable Accented TTS (2024)0.00
- Multi-scale Accent Modeling And Disentangling For Multi-speaker Multi-accent Text-to-speech Synthesis (2024)2.26
- Transfer The Linguistic Representations From TTS To Accent Conversion With Non-parallel Data (2024)6.77
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Tts-guided Training For Accent Conversion Without Parallel Data (2022)8.60
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00