Improving Zero-shot Chinese-english Code-switching ASR With Knn-ctc And Gated Monolingual Datastores
2024 Β· Jiaming Zhou, Shiwan Zhao, Hui Wang, et al.
Abstract
The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of our gated datastore mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR.
Authors
(none)
Tags
Stats
Related papers
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Reducing Spelling Inconsistencies In Code-switching ASR Using Contextualized CTC Loss (2020)4.52
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92