Reduce, Reuse, Recycle: Is Perturbed Data Better Than Other Language Augmentation For Low Resource Self-supervised Speech Models
2023 Β· Asad Ullah, Alessandro Ragano, Andrew Hines
Abstract
Self-supervised representation learning (SSRL) has demonstrated superior performance than supervised models for tasks including phoneme recognition. Training SSRL models poses a challenge for low-resource languages where sufficient pre-training data may not be available. A common approach is cross-lingual pre-training. Instead, we propose to use audio augmentation techniques, namely: pitch variation, noise addition, accented target language and other language speech to pre-train SSRL models in a low resource condition and evaluate phoneme recognition. Our comparisons found that a combined synthetic augmentations (noise/pitch) strategy outperformed accent and language knowledge transfer. Furthermore, we examined the scaling factor of augmented data to achieve equivalent performance to model pre-trained with target domain speech. Our findings suggest that for resource-constrained languages, combined augmentations can be a viable option than other augmentations.
Authors
(none)
Tags
Stats
Related papers
- Frustratingly Easy Data Augmentation For Low-resource ASR (2025)0.00
- Exploring The Impact Of Data Quantity On ASR In Extremely Low-resource Languages (2024)0.00
- Learning Cross-lingual Mappings For Data Augmentation To Improve Low-resource Speech Recognition (2023)0.00
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- Speech Synthesis As Augmentation For Low-resource ASR (2020)0.00
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Learning From Multiple Noisy Augmented Data Sets For Better Cross-lingual Spoken Language Understanding (2021)3.58
- Automatic Data Augmentation For Domain Adapted Fine-tuning Of Self-supervised Speech Representations (2023)0.00