Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks
2021 Β· Md Akmal Haidar, Mehdi Rezagholizadeh
Abstract
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored for low-resource ASR corpora. GANs help to learn the true data representation through a two-player min-max game. However, training an E2E ASR model using a large ASR corpus with a GAN framework has never been explored, because it might take excessively long time due to high-variance gradient updates and face convergence issues. In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data. Since the ASR model is pre-trained, we hypothesize that the ASR model output (soft distribution vectors) helps to get higher scores from the discriminator and makes the task of the discriminator harder within our GAN framework, which in turn improves the performance of the ASR model in the fine-tuning stage. Here, the
Authors
(none)
Tags
Stats
Related papers
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00
- Efficient Acoustic Feature Transformation In Mismatched Environments Using A Guided-gan (2022)2.26
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)16.14
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- Investigating Generative Adversarial Networks Based Speech Dereverberation For Robust Speech Recognition (2018)10.74
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)16.21