Best Of Both Worlds: Robust Accented Speech Recognition With Adversarial Transfer Learning
2021 Β· Nilaksh Das, Sravan Bodapati, Monica Sunkara, et al.
Abstract
Training deep neural networks for automatic speech recognition (ASR) requires large amounts of transcribed speech. This becomes a bottleneck for training robust models for accented speech which typically contains high variability in pronunciation and other semantics, since obtaining large amounts of annotated accented data is both tedious and costly. Often, we only have access to large amounts of unannotated speech from different accents. In this work, we leverage this unannotated data to provide semantic regularization to an ASR model that has been trained only on one accent, to improve its performance for multiple accents. We propose Accent Pre-Training (Acc-PT), a semi-supervised training strategy that combines transfer learning and adversarial training. Our approach improves the performance of a state-of-the-art ASR model by 33% on average over the baseline across multiple accents, training only on annotated samples from one standard accent, and as little as 105 minutes of unannota
Authors
(none)
Tags
Stats
Related papers
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Aipnet: Generative Adversarial Pre-training Of Accent-invariant Networks For End-to-end Speech Recognition (2019)10.48
- Accent Conversion In Text-to-speech Using Multi-level VAE And Adversarial Training (2024)5.84
- Intapt: Information-theoretic Adversarial Prompt Tuning For Enhanced Non-native Speech Recognition (2023)3.58
- Audio Adversarial Examples For Robust Hybrid Ctc/attention Speech Recognition (2020)3.58
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21