Syntactic And Semantic Features For Code-switching Factored Language Models
2017 Β· Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, et al.
Abstract
This paper presents our latest investigations on different features for factored language models for Code-Switching speech and their effect on automatic speech recognition (ASR) performance. We focus on syntactic and semantic features which can be extracted from Code-Switching text data and integrate them into factored language models. Different possible factors, such as words, part-of-speech tags, Brown word clusters, open class words and clusters of open class word embeddings are explored. The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME. In ASR experiments, the model containing Brown word clusters and part-of-speech tags and the model also including clusters of open class word embeddings yield the best mixed error rate results. In summary, the best language model can significantly reduce the perplexity on
Authors
(none)
Tags
Stats
Related papers
- Joint Modeling Of Code-switched And Monolingual ASR Via Conditional Factorization (2021)8.60
- Language Modeling For Code-switching: Evaluation, Integration Of Monolingual Data, And Discriminative Training (2018)5.24
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Using Heterogeneity In Semi-supervised Transcription Hypotheses To Improve Code-switched Speech Recognition (2021)0.00
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92
- Reducing Language Confusion For Code-switching Speech Recognition With Token-level Language Diarization (2022)10.07