Improved Child Text-to-speech Synthesis Through Fastpitch-based Transfer Learning
2023 Β· Rishabh Jain, Peter Corcoran
Abstract
Speech synthesis technology has witnessed significant advancements in recent years, enabling the creation of natural and expressive synthetic speech. One area of particular interest is the generation of synthetic child speech, which presents unique challenges due to children's distinct vocal characteristics and developmental stages. This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech. This study uses the transfer learning training pipeline. The approach involved finetuning a multi-speaker TTS model to work with child speech. We use the cleaned version of the publicly available MyST dataset (55 hours) for our finetuning experiments. We also release a prototype dataset of synthetic speech samples generated from this research together with model code to support further research. By using a pretrained MOSNet, we conducted an objective assessment that showed a significant correlation between real and
Authors
(none)
Tags
Stats
Related papers
- A Text-to-speech Pipeline, Evaluation Methodology, And Initial Fine-tuning Results For Child Speech Synthesis (2022)10.21
- Fastpitch: Parallel Text-to-speech With Pitch Prediction (2020)16.23
- Enhancement Of Pitch Controllability Using Timbre-preserving Pitch Augmentation In Fastpitch (2022)0.00
- Fastspeech: Fast, Robust And Controllable Text To Speech (2019)0.00
- Fastspeech 2: Fast And High-quality End-to-end Text To Speech (2020)0.00
- Stable-tts: Stable Speaker-adaptive Text-to-speech Synthesis Via Prosody Prompting (2024)4.52
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00