Coco-nut: Corpus Of Japanese Utterance And Voice Characteristics Description For Prompt-based Control
2023 Β· Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, et al.
Abstract
In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive and complicated control of voice characteristics. A sufficiently large corpus of high-quality and diverse voice samples with corresponding free-form descriptions can advance such control research. However, neither an open corpus nor a scalable method is currently available. To this end, we develop Coco-Nut, a new corpus including diverse Japanese utterances, along with text transcriptions and free-form voice characteristics descriptions. Our methodology to construct this corpus consists of 1) automatic collection of voice-related audio data from the Internet, 2) quality assurance, and 3) manual annotation using crowdsourcing. Additionally, we benchmark our corpus on the prompt embedding model trained by contrastive speech-text learning.
Authors
(none)
Tags
Stats
Related papers
- Building Speech Corpus With Diverse Voice Characteristics For Its Prompt-based Representation (2024)0.00
- JVS Corpus: Free Japanese Multi-speaker Voice Corpus (2019)0.00
- Who Finds This Voice Attractive? A Large-scale Experiment Using In-the-wild Data (2024)0.00
- Libritts-p: A Corpus With Speaking Style And Speaker Identity Prompts For Text-to-speech And Style Captioning (2024)11.91
- SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned To MSCOCO Data Set (2017)8.82
- Textrolspeech: A Text Style Control Speech Corpus With Codec Language Text-to-speech Models (2023)9.59
- Prompttts++: Controlling Speaker Identity In Prompt-based Text-to-speech Using Natural Language Descriptions (2023)9.23
- The ISCSLP 2024 Conversational Voice Clone (covoc) Challenge: Tasks, Results And Findings (2024)3.58