Improving Low Resource Code-switched ASR Using Augmented Code-switched TTS
2020 Β· Yash Sharma, Basil Abraham, Karan Taneja, et al.
Abstract
Building Automatic Speech Recognition (ASR) systems for code-switched speech has recently gained renewed attention due to the widespread use of speech technologies in multilingual communities worldwide. End-to-end ASR systems are a natural modeling choice due to their ease of use and superior performance in monolingual settings. However, it is well known that end-to-end systems require large amounts of labeled speech. In this work, we investigate improving code-switched ASR in low resource settings via data augmentation using code-switched text-to-speech (TTS) synthesis. We propose two targeted techniques to effectively leverage TTS speech samples: 1) Mixup, an existing technique to create new training samples via linear interpolation of existing samples, applied to TTS and real speech samples, and 2) a new loss function, used in conjunction with TTS samples, to encourage code-switched predictions. We report significant improvements in ASR performance achieving absolute word error rate
Authors
(none)
Tags
Stats
Related papers
- Data Augmentation For End-to-end Code-switching Speech Recognition (2020)9.92
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- Improving Code-switching And Named Entity Recognition In ASR With Speech Editing Based Data Augmentation (2023)6.34
- Code-switching Speech Recognition Under The Lens: Model- And Data-centric Perspectives (2025)0.00
- Acoustic And Textual Data Augmentation For Improved ASR Of Code-switching Speech (2018)9.92
- End-to-end Code-switching ASR For Low-resourced Language Pairs (2019)9.76
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68