Enhancing Synthetic Training Data For Speech Commands: From Asr-based Filtering To Domain Adaptation In SSL Latent Space
2024 · Sebastião Quintas, Isabelle Ferrané, Thomas Pellegrini
Abstract
The use of synthetic speech as data augmentation is gaining increasing popularity in fields such as automatic speech recognition and speech classification tasks. Despite novel text-to-speech systems with voice cloning capabilities, that allow the usage of a larger amount of voices based on short audio segments, it is known that these systems tend to hallucinate and oftentimes produce bad data that will most likely have a negative impact on the downstream task. In the present work, we conduct a set of experiments around zero-shot learning with synthetic speech data for the specific task of speech commands classification. Our results on the Google Speech Commands dataset show that a simple ASR-based filtering method can have a big impact in the quality of the generated data, translating to a better performance. Furthermore, despite the good quality of the generated speech data, we also show that synthetic and real speech can still be easily distinguishable when using self-supervised (Wav
Authors
(none)
Tags
Stats
Related papers
- Spoken Language Corpora Augmentation With Domain-specific Voice-cloned Speech (2024)0.00
- Automatic Data Augmentation For Domain Adapted Fine-tuning Of Self-supervised Speech Representations (2023)0.00
- A Simple Baseline For Domain Adaptation In End To End ASR Systems Using Synthetic Data (2022)7.16
- Low-resource Self-supervised Learning With Ssl-enhanced TTS (2023)0.00
- On The Effect Of Purely Synthetic Training Data For Different Automatic Speech Recognition Architectures (2024)0.00
- A Domain Adaptation Framework For Speech Recognition Systems With Only Synthetic Data (2025)5.24
- Corpus Synthesis For Zero-shot ASR Domain Adaptation Using Large Language Models (2023)5.84
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68