Deepfake Audio As A Data Augmentation Technique For Training Automatic Speech To Text Transcription Models
2023 · Alexandre R. Ferreira, Cláudio E. C. Campelo
Abstract
To train transcriptor models that produce robust results, a large and diverse labeled dataset is required. Finding such data with the necessary characteristics is a challenging task, especially for languages less popular than English. Moreover, producing such data requires significant effort and often money. Therefore, a strategy to mitigate this problem is the use of data augmentation techniques. In this work, we propose a framework that approaches data augmentation based on deepfake audio. To validate the produced framework, experiments were conducted using existing deepfake and transcription models. A voice cloner and a dataset produced by Indians (in English) were selected, ensuring the presence of a single accent in the dataset. Subsequently, the augmented data was used to train speech to text models in various scenarios.
Authors
(none)
Tags
Stats
Related papers
- AUDETER: A Large-scale Dataset For Deepfake Audio Detection In Open Worlds (2025)0.00
- Transsionadd: A Multi-frame Reinforcement Based Sequence Tagging Model For Audio Deepfake Detection (2023)0.00
- Training-free Deepfake Voice Recognition By Leveraging Large-scale Pre-trained Models (2024)9.23
- Zero-day Audio Deepfake Detection Via Retrieval Augmentation And Profile Matching (2025)0.00
- MLAAD: The Multi-language Audio Anti-spoofing Dataset (2024)13.34
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- Spoken Language Corpora Augmentation With Domain-specific Voice-cloned Speech (2024)0.00