Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives
2025 · Hexin Liu, Haoyang Zhang, Qiquan Zhang, et al.
Abstract
Code-switching automatic speech recognition (CS-ASR) presents unique challenges due to language confusion introduced by spontaneous intra-sentence switching and accent bias that blurs the phonetic boundaries. Although the constituent languages may be individually high-resource, the scarcity of annotated code-switching data further compounds these challenges. In this paper, we systematically analyze CS-ASR from both model-centric and data-centric perspectives. By comparing state-of-the-art algorithmic methods, including language-specific processing and auxiliary language-aware multi-task learning, we discuss their varying effectiveness across datasets with different linguistic characteristics. On the data side, we first investigate TTS as a data augmentation method. By varying the textual characteristics and speaker accents, we analyze the impact of language confusion and accent bias on CS-ASR. To further mitigate data scarcity and enhance textual diversity, we propose a prompting strat
Authors
(none)
Tags
Stats
Related papers
- Code-switching Detection With Data-augmented Acoustic And Language Models (2018)3.58
- The ASRU 2019 Mandarin-english Code-switching Speech Recognition Challenge: Open Datasets, Tracks, Methods And Results (2020)0.00
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Language Modeling For Code-switching: Evaluation, Integration Of Monolingual Data, And Discriminative Training (2018)5.24
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92
- End-to-end Code-switching ASR For Low-resourced Language Pairs (2019)9.76
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92