Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding
2024 Β· Jiahui Zhao, Hao Shi, Chenrui Cui, et al.
Abstract
Code-switching (CS) automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches. Adaptation on the pre-trained multi-lingual model has shown promising performance for CS-ASR. In this paper, we adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts. First, we propose an encoder refiner to enhance the encoder's capacity of intra-sentence swithching. Second, we propose using two sets of language-aware adapters with different language prompt embeddings to achieve language-specific decoding information in each decoder layer. Then, a fusion module is added to fuse the language-aware decoding. The experimental results using the SEAME dataset show that, compared with the baseline model, the proposed approach achieves a relative MER reduction of 4.1% and 7.2% on the dev_man and dev_sge test sets, respectively, surpassing st
Authors
(none)
Tags
Stats
Related papers
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Simul-whisper: Attention-guided Streaming Whisper With Truncation Detection (2024)6.34
- M2r-whisper: Multi-stage And Multi-scale Retrieval Augmentation For Enhancing Whisper (2024)6.77
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Reducing Language Confusion For Code-switching Speech Recognition With Token-level Language Diarization (2022)10.07