The ASRU 2019 Mandarin-english Code-switching Speech Recognition Challenge: Open Datasets, Tracks, Methods And Results
2020 Β· Xian Shi, Qiangze Feng, Lei Xie
Abstract
Code-switching (CS) is a common phenomenon and recognizing CS speech is challenging. But CS speech data is scarce and there' s no common testbed in relevant research. This paper describes the design and main outcomes of the ASRU 2019 Mandarin-English code-switching speech recognition challenge, which aims to improve the ASR performance in Mandarin-English code-switching situation. 500 hours Mandarin speech data and 240 hours Mandarin-English intra-sentencial CS data are released to the participants. Three tracks were set for advancing the AM and LM part in traditional DNN-HMM ASR system, as well as exploring the E2E models' performance. The paper then presents an overview of the results and system performance in the three tracks. It turns out that traditional ASR system benefits from pronunciation lexicon, CS text generating and data augmentation. In E2E track, however, the results highlight the importance of using language identification, building-up a rational set of modeling units a
Authors
(none)
Tags
Stats
Related papers
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24
- Code-switching Detection With Data-augmented Acoustic And Language Models (2018)3.58
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- On The End-to-end Solution To Mandarin-english Code-switching Speech Recognition (2018)12.10
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84
- End-to-end Code-switching ASR For Low-resourced Language Pairs (2019)9.76