Investigating Zero-shot Generalizability On Mandarin-english Code-switched ASR And Speech-to-text Translation Of Recent Foundation Models With Self-supervision And Weak Supervision
2023 Β· Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, et al.
Abstract
This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks.
Authors
(none)
Tags
Stats
Related papers
- Cross-lingual Transfer Learning For Speech Translation (2024)6.34
- On The Transferability Of Whisper-based Representations For "in-the-wild" Cross-task Downstream Speech Applications (2023)0.00
- Benchmarking Children's ASR With Supervised And Self-supervised Speech Foundation Models (2024)8.60
- Investigating The Emergent Audio Classification Ability Of ASR Foundation Models (2023)5.84
- Whistle: Data-efficient Multilingual And Crosslingual Speech Recognition Via Weakly Phonetic Supervision (2024)10.38
- Using Heterogeneity In Semi-supervised Transcription Hypotheses To Improve Code-switched Speech Recognition (2021)0.00
- Probing The Hidden Talent Of ASR Foundation Models For L2 English Oral Assessment (2025)0.00
- Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization (2023)16.38