Unsupervised Fine-tuning Data Selection For ASR Using Self-supervised Speech Models
2022 Β· Reem Gody, David Harwath
Abstract
Self-supervised learning (SSL) has been able to leverage unlabeled data to boost the performance of automatic speech recognition (ASR) models when we have access to only a small amount of transcribed speech data. However, this raises the question of which subset of the available unlabeled data should be selected for transcription. Our work investigates different unsupervised data selection techniques for fine-tuning the HuBERT model under a limited transcription budget. We investigate the impact of speaker diversity, gender bias, and topic diversity on the downstream ASR performance. We also devise two novel techniques for unsupervised data selection: pre-training loss based data selection and the perplexity of byte pair encoded clustered units (PBPE) and we show how these techniques compare to pure random data selection. Finally, we analyze the correlations between the inherent characteristics of the selected fine-tuning subsets as well as how these characteristics correlate with the
Authors
(none)
Tags
Stats
Related papers
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Unsupervised Active Learning: Optimizing Labeling Cost-effectiveness For Automatic Speech Recognition (2023)0.00
- Boosting Cross-domain Speech Recognition With Self-supervision (2022)0.00
- Deploying Self-supervised Learning In The Wild For Hybrid Automatic Speech Recognition (2022)0.00
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Pushing The Limits Of Unsupervised Unit Discovery For SSL Speech Representation (2023)6.34
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Automatic Data Augmentation For Domain Adapted Fine-tuning Of Self-supervised Speech Representations (2023)0.00