Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition
2024 Β· Chien-Chun Wang, Li-Wei Chen, Cheng-Kang Chou, et al.
Abstract
While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To mitigate this issue, we propose a novel channel-aware data simulation method for robust ASR training. Our method harnesses the synergistic power of channel-extractive techniques and generative adversarial networks (GANs). We first train a channel encoder capable of extracting embeddings from arbitrary audio. On top of this, channel embeddings are extracted using a minimal amount of target-domain data and used to guide a GAN-based speech synthesizer. This synthesizer generates speech that faithfully preserves the phonetic content of the input while mimicking the channel characteristics of the target domain. We evaluate our method on the challenging Hakka Across Taiwan (HAT) and Taiwanese Across Taiwan (TAT) corpora, achieving relative charac
Authors
(none)
Tags
Stats
Related papers
- Robust Speech Recognition Using Generative Adversarial Networks (2017)11.29
- Efficient Acoustic Feature Transformation In Mismatched Environments Using A Guided-gan (2022)2.26
- Exploring Speech Enhancement With Generative Adversarial Networks For Robust Speech Recognition (2017)16.14
- Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models (2019)9.23
- Investigating Generative Adversarial Networks Based Speech Dereverberation For Robust Speech Recognition (2018)10.74
- Effective Noise-aware Data Simulation For Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation (2024)3.58
- Fine-tuning Of Pre-trained End-to-end Speech Recognition With Generative Adversarial Networks (2021)5.84
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00