Ba-moe: Boundary-aware Mixture-of-experts Adapter For Code-switching Speech Recognition
2023 Β· Peikun Chen, Fan Yu, Yuhao Lian, et al.
Abstract
Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language bou
Authors
(none)
Tags
Stats
Related papers
- Language-routing Mixture Of Experts For Multilingual And Code-switching Speech Recognition (2023)9.03
- CAMEL: Cross-attention Enhanced Mixture-of-experts And Language Bias For Code-switching Speech Recognition (2024)0.00
- Sc-moe: Switch Conformer Mixture Of Experts For Unified Streaming And Non-streaming Code-switching ASR (2024)6.77
- Mole : Mixture Of Language Experts For Multi-lingual Automatic Speech Recognition (2023)9.41
- Speechmoe: Scaling To Large Acoustic Models With Dynamic Routing Mixture Of Experts (2021)10.97
- Lae-st-moe: Boosted Language-aware Encoder Using Speech Translation Auxiliary Task For E2E Code-switching ASR (2023)6.34
- An Effective Mixture-of-experts Approach For Code-switching Speech Recognition Leveraging Encoder Disentanglement (2024)0.00
- Building A Great Multi-lingual Teacher With Sparsely-gated Mixture Of Experts For Speech Recognition (2021)0.00