Building Speech Corpus With Diverse Voice Characteristics For Its Prompt-based Representation
2024 Β· Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, et al.
Abstract
In text-to-speech synthesis, the ability to control voice characteristics is vital for various applications. By leveraging thriving text prompt-based generation techniques, it should be possible to enhance the nuanced control of voice characteristics. While previous research has explored the prompt-based manipulation of voice characteristics, most studies have used pre-recorded speech, which limits the diversity of voice characteristics available. Thus, we aim to address this gap by creating a novel corpus and developing a model for prompt-based manipulation of voice characteristics in text-to-speech synthesis, facilitating a broader range of voice characteristics. Specifically, we propose a method to build a sizable corpus pairing voice characteristics descriptions with corresponding speech samples. This involves automatically gathering voice-related speech data from the Internet, ensuring its quality, and manually annotating it using crowdsourcing. We implement this method with Japan
Authors
(none)
Tags
Stats
Related papers
- Coco-nut: Corpus Of Japanese Utterance And Voice Characteristics Description For Prompt-based Control (2023)5.84
- Generating Speakers By Prompting Listener Impressions For Pre-trained Multi-speaker Text-to-speech Systems (2024)3.58
- Prompttts++: Controlling Speaker Identity In Prompt-based Text-to-speech Using Natural Language Descriptions (2023)9.23
- Prompt-singer: Controllable Singing-voice-synthesis With Natural Language Prompt (2024)6.77
- Instructtts: Modelling Expressive TTS In Discrete Latent Space With Natural Language Style Prompt (2023)0.00
- Libritts-p: A Corpus With Speaking Style And Speaker Identity Prompts For Text-to-speech And Style Captioning (2024)11.91
- Textrolspeech: A Text Style Control Speech Corpus With Codec Language Text-to-speech Models (2023)9.59
- PROEMO: Prompt-driven Text-to-speech Synthesis Based On Emotion And Intensity Control (2025)0.00