Speech Collage: Code-switched Audio Generation By Collaging Monolingual Corpora
2023 · Amir Hussein, Dorsa Zeinali, Ondřej Klejch, et al.
Abstract
Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero-shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model's code-switching inclination and reduces its monolingual bias.
Authors
(none)
Tags
Stats
Related papers
- Textual Data Augmentation For Arabic-english Code-switching Speech Recognition (2022)6.77
- Language-agnostic Code-switching In Sequence-to-sequence Speech Recognition (2022)0.00
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92
- Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS (2020)7.50
- End-to-end Code-switching ASR For Low-resourced Language Pairs (2019)9.76
- Unified Model For Code-switching Speech Recognition And Language Identification Based On A Concatenated Tokenizer (2023)8.09
- Data Augmentation For End-to-end Code-switching Speech Recognition (2020)9.92