Aligning Speech To Languages To Enhance Code-switching Speech Recognition
2024 Β· Hexin Liu, Xiangyu Zhang, Haoyang Zhang, et al.
Abstract
Code-switching (CS) refers to the switching of languages within a speech signal and results in language confusion for automatic speech recognition (ASR). To address language confusion, we propose a language alignment loss (LAL) that aligns acoustic features to pseudo-language labels learned from the ASR decoder during ASR training. This approach enables frame-level language identification without the need for frame-level language annotations. To further tackle the complex token alternatives for language modeling in bilingual scenarios, we propose to employ large language models via a generative error correction method. A linguistic hint, derived from LAL outputs and decoded hypotheses, is introduced to guide the prompting and enhance the LLM-based generative error correction for CS-ASR. The proposed methods are evaluated on the SEAME dataset and data from the ASRU 2019 Mandarin-English code-switching speech recognition challenge. The incorporation of the proposed language alignment los
Authors
(none)
Tags
Stats
Related papers
- Generative Error Correction For Code-switching Speech Recognition Using Large Language Models (2023)0.00
- Reducing Language Confusion For Code-switching Speech Recognition With Token-level Language Diarization (2022)10.07
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Exploring Retraining-free Speech Recognition For Intra-sentential Code-switching (2021)5.84
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Code-switching Without Switching: Language Agnostic End-to-end Speech Translation (2022)0.00