CAMEL: Cross-attention Enhanced Mixture-of-experts And Language Bias For Code-switching Speech Recognition
2024 Β· He Wang, Xucheng Wan, Naijun Zheng, et al.
Abstract
Code-switching automatic speech recognition (ASR) aims to transcribe speech that contains two or more languages accurately. To better capture language-specific speech representations and address language confusion in code-switching ASR, the mixture-of-experts (MoE) architecture and an additional language diarization (LD) decoder are commonly employed. However, most researches remain stagnant in simple operations like weighted summation or concatenation to fuse languagespecific speech representations, leaving significant opportunities to explore the enhancement of integrating language bias information. In this paper, we introduce CAMEL, a cross-attention-based MoE and language bias approach for code-switching ASR. Specifically, after each MoE layer, we fuse language-specific speech representations with cross-attention, leveraging its strong contextual modeling abilities. Additionally, we design a source attention-based mechanism to incorporate the language information from the LD decode
Authors
(none)
Tags
Stats
Related papers
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92
- Ba-moe: Boundary-aware Mixture-of-experts Adapter For Code-switching Speech Recognition (2023)7.50
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- Language-routing Mixture Of Experts For Multilingual And Code-switching Speech Recognition (2023)9.03
- An Effective Mixture-of-experts Approach For Code-switching Speech Recognition Leveraging Encoder Disentanglement (2024)0.00
- Lae-st-moe: Boosted Language-aware Encoder Using Speech Translation Auxiliary Task For E2E Code-switching ASR (2023)6.34
- Sc-moe: Switch Conformer Mixture Of Experts For Unified Streaming And Non-streaming Code-switching ASR (2024)6.77
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00