Automatic Data Augmentation For Domain Adapted Fine-tuning Of Self-supervised Speech Representations
2023 Β· Salah Zaiem, Titouan Parcollet, Slim Essid
Abstract
Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while facing an acoustic mismatch between the pretraining and target datasets. To address this issue, we propose a novel supervised domain adaptation method, designed for cases exhibiting such a mismatch in acoustic domains. It consists in applying properly calibrated data augmentations on a large clean dataset, bringing it closer to the target domain, and using it as part of an initial fine-tuning stage. Augmentations are automatically selected through the minimization of a conditional-dependence estimator, based on the target dataset. The approach is validated during an oracle experiment with controlled distortions and on two amateur-collected low-resource domains, reaching better performances compared to the baselines in both cases.
Authors
(none)
Tags
Stats
Related papers
- Boosting Cross-domain Speech Recognition With Self-supervision (2022)0.00
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23
- Self-supervised Learning Based Domain Adaptation For Robust Speaker Verification (2021)11.49
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Ac-mix: Self-supervised Adaptation For Low-resource Automatic Speech Recognition Using Agnostic Contrastive Mixup (2024)2.26
- How To Learn A New Language? An Efficient Solution For Self-supervised Learning Models Unseen Languages Adaption In Low-resource Scenario (2024)0.00
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00
- PADA: Pruning Assisted Domain Adaptation For Self-supervised Speech Representations (2022)5.24