Low-resource Self-supervised Learning With Ssl-enhanced TTS
2023 Β· Po-Chun Hsu, Ali Elkahky, Wei-Ning Hsu, et al.
Abstract
Self-supervised learning (SSL) techniques have achieved remarkable results in various speech processing tasks. Nonetheless, a significant challenge remains in reducing the reliance on vast amounts of speech data for pre-training. This paper proposes to address this challenge by leveraging synthetic speech to augment a low-resource pre-training corpus. We construct a high-quality text-to-speech (TTS) system with limited resources using SSL features and generate a large synthetic corpus for pre-training. Experimental results demonstrate that our proposed approach effectively reduces the demand for speech data by 90% with only slight performance degradation. To the best of our knowledge, this is the first work aiming to enhance low-resource self-supervised learning in speech processing.
Authors
(none)
Tags
Stats
Related papers
- Analytic Study Of Text-free Speech Synthesis For Raw Audio Using A Self-supervised Learning Model (2024)0.00
- Combining Spectral And Self-supervised Features For Low Resource Speech Recognition And Translation (2022)8.82
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00
- On The Use Of Self-supervised Speech Representations In Spontaneous Speech Synthesis (2023)0.00
- Target Speech Extraction With Pre-trained Self-supervised Learning Models (2024)9.41
- Enhancing Synthetic Training Data For Speech Commands: From Asr-based Filtering To Domain Adaptation In SSL Latent Space (2024)0.00
- An Initial Investigation Of Language Adaptation For TTS Systems Under Low-resource Scenarios (2024)3.58