Training Data Augmentation For Dysarthric Automatic Speech Recognition By Text-to-dysarthric-speech Synthesis
2024 Β· Wing-Zin Leung, Mattias Cross, Anton Ragni, et al.
Abstract
Automatic speech recognition (ASR) research has achieved impressive performance in recent years and has significant potential for enabling access for people with dysarthria (PwD) in augmentative and alternative communication (AAC) and home environment systems. However, progress in dysarthric ASR (DASR) has been limited by high variability in dysarthric speech and limited public availability of dysarthric training data. This paper demonstrates that data augmentation using text-to-dysarthic-speech (TTDS) synthesis for finetuning large ASR models is effective for DASR. Specifically, diffusion-based text-to-speech (TTS) models can produce speech samples similar to dysarthric speech that can be used as additional training data for fine-tuning ASR foundation models, in this case Whisper. Results show improved synthesis metrics and ASR performance for the proposed multi-speaker diffusion-based TTDS data augmentation for ASR fine-tuning compared to current DASR baselines.
Authors
(none)
Tags
Stats
Related papers
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Improving Accented Speech Recognition Using Data Augmentation Based On Unsupervised Text-to-speech Synthesis (2024)0.00
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- TDASS: Target Domain Adaptation Speech Synthesis Framework For Multi-speaker Low-resource TTS (2022)0.00
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77
- A Domain Adaptation Framework For Speech Recognition Systems With Only Synthetic Data (2025)5.24
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68