Investigation Of Ensemble Features Of Self-supervised Pretrained Models For Automatic Speech Recognition
2022 Β· A Arunkumar, Vrunda N Sukhadia, S. Umesh
Abstract
Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks. Several state-of-the-art SSL models are available, and each of these models optimizes a different loss which gives rise to the possibility of their features being complementary. This paper proposes using an ensemble of such SSL representations and models, which exploits the complementary nature of the features extracted by the various pretrained models. We hypothesize that this results in a richer feature representation and shows results for the ASR downstream task. To this end, we use three SSL models that have shown excellent results on ASR tasks, namely HuBERT, Wav2vec2.0, and WaveLM. We explore the ensemble of models fine-tuned for the ASR task and the ensemble of features using the embeddings obtained from the pre-trained models for a downstream ASR task. We get improved performance over individual models and pre-t
Authors
(none)
Tags
Stats
Related papers
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Exploring Effective Fusion Algorithms For Speech Based Self-supervised Learning Models (2022)0.00
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Feature Learning And Ensemble Pre-tasks Based Self-supervised Speech Denoising And Dereverberation (2022)0.00
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Automatic Pronunciation Assessment Using Self-supervised Speech Representation Learning (2022)0.00
- Investigating Self-supervised Learning For Speech Enhancement And Separation (2022)13.44