LASER: Learning By Aligning Self-supervised Representations Of Speech For Improving Content-related Tasks
2024 Β· Amit Meghanani, Thomas Hain
Abstract
Self-supervised learning (SSL)-based speech models are extensively used for full-stack speech processing. However, it has been observed that improving SSL-based speech representations using unlabeled speech for content-related tasks is challenging and computationally expensive. Recent attempts have been made to address this issue with cost-effective self-supervised fine-tuning (SSFT) approaches. Continuing in this direction, a cost-effective SSFT method named "LASER: Learning by Aligning Self-supervised Representations" is presented. LASER is based on the soft-DTW alignment loss with temporal regularisation term. Experiments are conducted with HuBERT and WavLM models and evaluated on the SUPERB benchmark for two content-related tasks: automatic speech recognition (ASR) and phoneme recognition (PR). A relative improvement of 3.7% and 8.2% for HuBERT, and 4.1% and 11.7% for WavLM are observed, for the ASR and PR tasks respectively, with only < 3 hours of fine-tuning on a single GPU.
Authors
(none)
Tags
Stats
Related papers
- SCORE: Self-supervised Correspondence Fine-tuning For Improved Content Representations (2024)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Fast-hubert: An Efficient Training Framework For Self-supervised Speech Representation Learning (2023)0.00
- Unispeech-sat: Universal Speech Representation Learning With Speaker Aware Pre-training (2021)0.00
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Unsupervised Active Learning: Optimizing Labeling Cost-effectiveness For Automatic Speech Recognition (2023)0.00
- Star: Distilling Speech Temporal Relation For Lightweight Speech Self-supervised Learning Models (2023)5.24