Transformer-transducers For Code-switched Speech Recognition
2020 Β· Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, et al.
Abstract
We live in a world where 60% of the population can speak two or more languages fluently. Members of these communities constantly switch between languages when having a conversation. As automatic speech recognition (ASR) systems are being deployed to the real-world, there is a need for practical systems that can handle multiple languages both within an utterance or across utterances. In this paper, we present an end-to-end ASR system using a transformer-transducer model architecture for code-switched speech recognition. We propose three modifications over the vanilla model in order to handle various aspects of code-switching. First, we introduce two auxiliary loss functions to handle the low-resource scenario of code-switching. Second, we propose a novel mask-based training strategy with language ID information to improve the label encoder training towards intra-sentential code-switching. Finally, we propose a multi-label/multi-audio encoder structure to leverage the vast monolingual sp
Authors
(none)
Tags
Stats
Related papers
- Decoupling Pronunciation And Language For End-to-end Code-switching Automatic Speech Recognition (2020)0.00
- Dual-decoder Transformer For Joint Automatic Speech Recognition And Multilingual Speech Translation (2020)13.73
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Multi-modal Transformers Utterance-level Code-switching Detection (2020)0.00
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50
- An Effective Mixture-of-experts Approach For Code-switching Speech Recognition Leveraging Encoder Disentanglement (2024)0.00