Sc-moe: Switch Conformer Mixture Of Experts For Unified Streaming And Non-streaming Code-switching ASR
2024 Β· Shuaishuai Ye, Shunfei Chen, Xinhui Hu, et al.
Abstract
In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Classification (CTC) loss as a router in the encoder of SC-MoE to achieve a real-time streaming CS ASR system. To further utilize the language information embedded in text, we also incorporate MoE layers into the decoder of SC-MoE. In addition, we introduce routers into every MoE layer of the encoder and the decoder and achieve better recognition performance. Experimental results show that the SC-MoE significantly improves CS ASR performances over baseline with comparable computational efficiency.
Authors
(none)
Tags
Stats
Related papers
- Language-routing Mixture Of Experts For Multilingual And Code-switching Speech Recognition (2023)9.03
- Lae-st-moe: Boosted Language-aware Encoder Using Speech Translation Auxiliary Task For E2E Code-switching ASR (2023)6.34
- Speechmoe: Scaling To Large Acoustic Models With Dynamic Routing Mixture Of Experts (2021)10.97
- Building A Great Multi-lingual Teacher With Sparsely-gated Mixture Of Experts For Speech Recognition (2021)0.00
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- CAMEL: Cross-attention Enhanced Mixture-of-experts And Language Bias For Code-switching Speech Recognition (2024)0.00
- Ba-moe: Boundary-aware Mixture-of-experts Adapter For Code-switching Speech Recognition (2023)7.50
- Integrating Knowledge In End-to-end Automatic Speech Recognition For Mandarin-english Code-switching (2021)5.24