Code-switched Language Models Using Neural Based Synthetic Data From Parallel Sentences
2019 Β· Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, et al.
Abstract
Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this issue. However, this require external word alignments or constituency parsers that create erroneous results on distant languages. We propose a sequence-to-sequence model using a copy mechanism to generate code-switching data by leveraging parallel monolingual translations from a limited source of code-switching data. The model learns how to combine words from parallel sentences and identifies when to switch one language to the other. Moreover, it captures code-switching constraints by attending and aligning the words in inputs, without requiring any external knowledge. Based on experimental results, the language model trained with the generated sentences achieves state-of-the-art performance and improves end-to-end automatic speech recognition.
Authors
(none)
Tags
Stats
Related papers
- Code-switching Sentence Generation By Generative Adversarial Networks And Its Application To Data Augmentation (2018)0.00
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Constrained Output Embeddings For End-to-end Code-switching Speech Recognition With Only Monolingual Data (2019)7.16
- Language Modeling For Code-switching: Evaluation, Integration Of Monolingual Data, And Discriminative Training (2018)5.24
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Generative Error Correction For Code-switching Speech Recognition Using Large Language Models (2023)0.00
- Enhancing Code-switched Text-to-speech Synthesis Capability In Large Language Models With Only Monolingual Corpora (2024)0.00
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84