Constrained Output Embeddings For End-to-end Code-switching Speech Recognition With Only Monolingual Data
2019 Β· Yerbolat Khassanov, Haihua Xu, van Tung Pham, et al.
Abstract
The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.
Authors
(none)
Tags
Stats
Related papers
- Data Augmentation For End-to-end Code-switching Speech Recognition (2020)9.92
- An Effective Mixture-of-experts Approach For Code-switching Speech Recognition Leveraging Encoder Disentanglement (2024)0.00
- Using Heterogeneity In Semi-supervised Transcription Hypotheses To Improve Code-switched Speech Recognition (2021)0.00
- Language Modeling For Code-switching: Evaluation, Integration Of Monolingual Data, And Discriminative Training (2018)5.24
- Code-switched Language Models Using Neural Based Synthetic Data From Parallel Sentences (2019)11.29
- On The End-to-end Solution To Mandarin-english Code-switching Speech Recognition (2018)12.10
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Towards End-to-end Code-switching Speech Recognition (2018)0.00