Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR
2024 Β· Weiqing Wang, Kunal Dhawan, Taejin Park, et al.
Abstract
Speech foundation models have achieved state-of-the-art (SoTA) performance across various tasks, such as automatic speech recognition (ASR) in hundreds of languages. However, multi-speaker ASR remains a challenging task for these models due to data scarcity and sparsity. In this paper, we present approaches to enable speech foundation models to process and understand multi-speaker speech with limited training data. Specifically, we adapt a speech foundation model for the multi-speaker ASR task using only telephonic data. Remarkably, the adapted model also performs well on meeting data without any fine-tuning, demonstrating the generalization ability of our approach. We conduct several ablation studies to analyze the impact of different parameters and strategies on model performance. Our findings highlight the effectiveness of our methods. Results show that less parameters give better overall cpWER, which, although counter-intuitive, provides insights into adapting speech foundation mod
Authors
(none)
Tags
Stats
Related papers
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- Resource-efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion (2022)0.00
- Parameter-efficient Adaptation Of Multilingual Multimodal Models For Low-resource ASR (2024)2.26
- Multi-view Multi-task Modeling With Speech Foundation Models For Speech Forensic Tasks (2024)0.00
- Structured Speaker-deficiency Adaptation Of Foundation Models For Dysarthric And Elderly Speech Recognition (2024)0.00
- Factorised Speaker-environment Adaptive Training Of Conformer Speech Recognition Systems (2023)0.00
- ADAPTERMIX: Exploring The Efficacy Of Mixture Of Adapters For Low-resource TTS Adaptation (2023)6.34
- Self-taught Recognizer: Toward Unsupervised Adaptation For Speech Foundation Models (2024)2.26