Effective Noise-aware Data Simulation For Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation
2024 Β· Chien-Chun Wang, Li-Wei Chen, Hung-Shin Lee, et al.
Abstract
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs) with only limited target noisy speech data. Notably, our method employs a noise encoder to extract noise embeddings from target-domain data. These embeddings aptly guide the generator to synthesize utterances acoustically fitted to the target domain while authentically preserving the phonetic content of the input clean speech. Furthermore, we introduce the notion of dynamic stochastic perturbation, which can inject controlled perturbations into the noise embeddings during inference, thereby enabling the model to generalize well to unseen noise conditions. Experiments on the VoiceBank-DEMAND benchmark dataset demon
Authors
(none)
Tags
Stats
Related papers
- Channel-aware Domain-adaptive Generative Adversarial Network For Robust Speech Recognition (2024)4.52
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Conditional Generative Adversarial Networks For Speech Enhancement And Noise-robust Speaker Verification (2017)16.03
- Noise-aware Speech Enhancement Using Diffusion Probabilistic Model (2023)8.82
- Property-aware Multi-speaker Data Simulation: A Probabilistic Modelling Technique For Synthetic Data Generation (2023)6.34
- Unsupervised Speech Enhancement With Deep Dynamical Generative Speech And Noise Models (2023)0.00
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- The Potential Of Neural Speech Synthesis-based Data Augmentation For Personalized Speech Enhancement (2022)6.77