Low-resource Expressive Text-to-speech Using Data Augmentation
2020 Β· Goeric Huybrechts, Thomas Merritt, Giulia Comini, et al.
Abstract
While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of such recordings. First, we augment data via voice conversion by leveraging recordings in the desired speaking style from other speakers. Next, we use that synthetic data on top of the available recordings to train a TTS model. Finally, we fine-tune that model to further increase quality. Our evaluations show that the proposed changes bring significant improvements over non-augmented models across many perceived aspects of synthesised speech. We demonstrate the proposed approach on 2 styles (newscaster and conversational), on various speakers, and on both single and multi-speaker models, illustra
Authors
(none)
Tags
Stats
Related papers
- Low-data? No Problem: Low-resource, Language-agnostic Conversational Text-to-speech Via F0-conditioned Data Augmentation (2022)0.00
- Non-autoregressive TTS With Explicit Duration Modelling For Low-resource Highly Expressive Speech (2021)8.82
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Cross-speaker Emotion Transfer For Low-resource Text-to-speech Using Non-parallel Voice Conversion With Pitch-shift Data Augmentation (2022)8.09
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- In Other News: A Bi-style Text-to-speech Model For Synthesizing Newscaster Voice With Limited Data (2019)8.35
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00