Frustratingly Easy Data Augmentation For Low-resource ASR
2025 Β· Katsumi Ibaraki, David Chiang
Abstract
This paper introduces three self-contained data augmentation methods for low-resource Automatic Speech Recognition (ASR). Our techniques first generate novel text--using gloss-based replacement, random replacement, or an LLM-based approach--and then apply Text-to-Speech (TTS) to produce synthetic audio. We apply these methods, which leverage only the original annotated data, to four languages with extremely limited resources (Vatlongos, Nashta, Shinekhen Buryat, and Kakabe). Fine-tuning a pretrained Wav2Vec2-XLSR-53 model on a combination of the original audio and generated synthetic data yields significant performance gains, including a 14.3% absolute WER reduction for Nashta. The methods prove effective across all four low-resource languages and also show utility for high-resource languages like English, demonstrating their broad applicability.
Authors
(none)
Tags
Stats
Related papers
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Speech Synthesis As Augmentation For Low-resource ASR (2020)0.00
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Custom Data Augmentation For Low Resource ASR Using Bark And Retrieval-based Voice Conversion (2023)0.00
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- Low-data? No Problem: Low-resource, Language-agnostic Conversational Text-to-speech Via F0-conditioned Data Augmentation (2022)0.00
- Reduce, Reuse, Recycle: Is Perturbed Data Better Than Other Language Augmentation For Low Resource Self-supervised Speech Models (2023)0.00