Robust Automatic Speech Recognition Via Wavaugment Guided Phoneme Adversarial Training
2023 Β· Gege Qi, Yuefeng Chen, Xiaofeng Mao, et al.
Abstract
Developing a practically-robust automatic speech recognition (ASR) is challenging since the model should not only maintain the original performance on clean samples, but also achieve consistent efficacy under small volume perturbations and large domain shifts. To address this problem, we propose a novel WavAugment Guided Phoneme Adversarial Training (wapat). wapat use adversarial examples in phoneme space as augmentation to make the model invariant to minor fluctuations in phoneme representation and preserve the performance on clean samples. In addition, wapat utilizes the phoneme representation of augmented samples to guide the generation of adversaries, which helps to find more stable and diverse gradient-directions, resulting in improved generalization. Extensive experiments demonstrate the effectiveness of wapat on End-to-end Speech Challenge Benchmark (ESB). Notably, SpeechLM-wapat outperforms the original model by 6.28% WER reduction on ESB, achieving the new state-of-the-art.
Authors
(none)
Tags
Stats
Related papers
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Adversarial Data Augmentation Using VAE-GAN For Disordered Speech Recognition (2022)0.00
- Phaseperturbation: Speech Data Augmentation Via Phase Perturbation For Automatic Speech Recognition (2023)0.00
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Patcorrect: Non-autoregressive Phoneme-augmented Transformer For ASR Error Correction (2023)0.00
- Unsupervised Domain Adaptation For Robust Speech Recognition Via Variational Autoencoder-based Data Augmentation (2017)14.23
- Audio Adversarial Examples For Robust Hybrid Ctc/attention Speech Recognition (2020)3.58