Language Modeling For Code-switching: Evaluation, Integration Of Monolingual Data, And Discriminative Training
2018 Β· Hila Gonen, Yoav Goldberg
Abstract
We focus on the problem of language modeling for code-switched language, in the context of automatic speech recognition (ASR). Language modeling for code-switched language is challenging for (at least) three reasons: (1) lack of available large-scale code-switched data for training; (2) lack of a replicable evaluation setup that is ASR directed yet isolates language modeling performance from the other intricacies of the ASR system; and (3) the reliance on generative modeling. We tackle these three issues: we propose an ASR-motivated evaluation setup which is decoupled from an ASR system and the choice of vocabulary, and provide an evaluation dataset for English-Spanish code-switching. This setup lends itself to a discriminative training approach, which we demonstrate to work better than generative language modeling. Finally, we explore a variety of training protocols and verify the effectiveness of training with large amounts of monolingual data followed by fine-tuning with small amoun
Authors
(none)
Tags
Stats
Related papers
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Using Heterogeneity In Semi-supervised Transcription Hypotheses To Improve Code-switched Speech Recognition (2021)0.00
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92
- Towards One Model To Rule All: Multilingual Strategy For Dialectal Code-switching Arabic ASR (2021)9.03
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92
- Generative Error Correction For Code-switching Speech Recognition Using Large Language Models (2023)0.00
- Constrained Output Embeddings For End-to-end Code-switching Speech Recognition With Only Monolingual Data (2019)7.16