Parameter-efficient Adaptation Of Multilingual Multimodal Models For Low-resource ASR
2024 Β· Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, et al.
Abstract
Automatic speech recognition (ASR) for low-resource languages remains a challenge due to the scarcity of labeled training data. Parameter-efficient fine-tuning and text-only adaptation are two popular methods that have been used to address such low-resource settings. In this work, we investigate how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T. Multimodal models are able to leverage unlabeled text via text-only adaptation with further parameter-efficient ASR fine-tuning, thus boosting ASR performance. We also show cross-lingual transfer from a high-resource language, achieving up to a relative 17% WER reduction over a baseline in a zero-shot setting without any labeled speech.
Authors
(none)
Tags
Stats
Related papers
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR (2024)3.58
- Transfer Learning Of Language-independent End-to-end ASR With Language Model Fusion (2018)0.00
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00
- Leveraging Parameter-efficient Transfer Learning For Multi-lingual Text-to-speech Adaptation (2024)0.00
- Cross-lingual Low Resource Speaker Adaptation Using Phonological Features (2021)5.24
- An Initial Investigation Of Language Adaptation For TTS Systems Under Low-resource Scenarios (2024)3.58
- Meta Learning For End-to-end Low-resource Speech Recognition (2019)0.00