Fearless: Feature Refinement Loss For Ensembling Self-supervised Learning Features In Robust End-to-end Speech Recognition
2022 Β· Szu-Jui Chen, Jiamin Xie, John H. L. Hansen
Abstract
Self-supervised learning representations (SSLR) have resulted in robust features for downstream tasks in many fields. Recently, several SSLRs have shown promising results on automatic speech recognition (ASR) benchmark corpora. However, previous studies have only shown performance for solitary SSLRs as an input feature for ASR models. In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models. In addition, we will show there are correlations between these extracted SSLRs. As such, we further propose a feature refinement loss for decorrelation to efficiently combine the set of input features. For evaluation, we show that the proposed 'FeaRLESS learning features' perform better than systems without the proposed feature refinement loss for both the WSJ and Fearless Steps Challenge (FSC) corpora.
Authors
(none)
Tags
Stats
Related papers
- EFFUSE: Efficient Self-supervised Feature Fusion For E2E ASR In Low Resource And Multilingual Scenarios (2023)6.34
- Fusion Of Discrete Representations And Self-augmented Representations For Multilingual Automatic Speech Recognition (2024)2.26
- End-to-end Integration Of Speech Recognition, Dereverberation, Beamforming, And Self-supervised Learning Representation (2022)8.60
- Exploring Effective Fusion Algorithms For Speech Based Self-supervised Learning Models (2022)0.00
- Investigation Of Ensemble Features Of Self-supervised Pretrained Models For Automatic Speech Recognition (2022)9.41
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Feature Learning And Ensemble Pre-tasks Based Self-supervised Speech Denoising And Dereverberation (2022)0.00
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35