Aipnet: Generative Adversarial Pre-training Of Accent-invariant Networks For End-to-end Speech Recognition
2019 Β· Yi-Chen Chen, Zhaojun Yang, Ching-Feng Yeh, et al.
Abstract
As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems. In this paper, our goal is to build a unified end-to-end speech recognition system that generalizes well across accents. For this purpose, we propose a novel pre-training framework AIPNet based on generative adversarial nets (GAN) for accent-invariant representation learning: Accent Invariant Pre-training Networks. We pre-train AIPNet to disentangle accent-invariant and accent-specific characteristics from acoustic features through adversarial training on accented data for which transcriptions are not necessarily available. We further fine-tune AIPNet by connecting the accent-invariant module with an attention-based encoder-decoder model for multi-accent speech recognition. In the experiments, our approach is compared against four baselines including both accent-dependent and accent-independent models. Experimental results on 9 English accents show th
Authors
(none)
Tags
Stats
Related papers
- Best Of Both Worlds: Robust Accented Speech Recognition With Adversarial Transfer Learning (2021)9.23
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks (2021)5.84
- Improving Accent Conversion With Reference Encoder And End-to-end Text-to-speech (2020)0.00
- Layer-wise Fast Adaptation For End-to-end Multi-accent Speech Recognition (2022)9.76
- Synthetic Cross-accent Data Augmentation For Automatic Speech Recognition (2023)0.00
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00