Adversarial Data Augmentation Using VAE-GAN For Disordered Speech Recognition
2022 Β· Zengrui Jin, Xurong Xie, Mengzhe Geng, et al.
Abstract
Automatic recognition of disordered speech remains a highly challenging task to date. The underlying neuro-motor conditions, often compounded with co-occurring physical disabilities, lead to the difficulty in collecting large quantities of impaired speech required for ASR system development. This paper presents novel variational auto-encoder generative adversarial network (VAE-GAN) based personalized disordered speech augmentation approaches that simultaneously learn to encode, generate and discriminate synthesized impaired speech. Separate latent features are derived to learn dysarthric speech characteristics and phoneme context representations. Self-supervised pre-trained Wav2vec 2.0 embedding features are also incorporated. Experiments conducted on the UASpeech corpus suggest the proposed adversarial data augmentation approach consistently outperformed the baseline speed perturbation and non-VAE GAN augmentation methods with trained hybrid TDNN and End-to-end Conformer systems. Afte
Authors
(none)
Tags
Stats
Related papers
- Personalized Adversarial Data Augmentation For Dysarthric And Elderly Speech Recognition (2022)11.49
- Spectro-temporal Deep Features For Disordered Speech Assessment And Recognition (2022)8.60
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23
- Variational Auto-encoder Based Variability Encoding For Dysarthric Speech Recognition (2022)7.16
- VSEGAN: Visual Speech Enhancement Generative Adversarial Network (2021)8.60
- Towards Generalized Speech Enhancement With Generative Adversarial Networks (2019)10.35
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29