Low-resourced Speech Recognition For Iu Mien Language Via Weakly-supervised Phoneme-based Multilingual Pre-training
2024 · Lukuan Dong, Donghong Qin, Fengbo Bai, et al.
Abstract
The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense that the annotated speech is very limited. With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition. Our experiments are based on the recently released, three backbone models pretrained over the 10 languages from the CommonVoice dataset (CV-Lang10), which correspond to the three approaches for low-resourced ASR. It is found that phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. Particularly, the Whistle models, i.e
Authors
(none)
Tags
Stats
Related papers
- Whistle: Data-efficient Multilingual And Crosslingual Speech Recognition Via Weakly Phonetic Supervision (2024)10.38
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models With Diverse Speech Variabilities (2024)4.52
- Towards Building Speech Large Language Models For Multitask Understanding In Low-resource Languages (2025)0.00
- Investigating Zero-shot Generalizability On Mandarin-english Code-switched ASR And Speech-to-text Translation Of Recent Foundation Models With Self-supervision And Weak Supervision (2023)0.00
- Multilingual And Unsupervised Subword Modeling For Zero-resource Languages (2018)7.81
- Exploiting Cross-lingual Speaker And Phonetic Diversity For Unsupervised Subword Modeling (2019)6.77
- Weighted Cross-entropy For Low-resource Languages In Multilingual Speech Recognition (2024)6.34