Incorporating Real-world Noisy Speech In Neural-network-based Speech Enhancement Systems
2021 Β· Yangyang Xia, Buye Xu, Anurag Kumar
Abstract
Supervised speech enhancement relies on parallel databases of degraded speech signals and their clean reference signals during training. This setting prohibits the use of real-world degraded speech data that may better represent the scenarios where such systems are used. In this paper, we explore methods that enable supervised speech enhancement systems to train on real-world degraded speech data. Specifically, we propose a semi-supervised approach for speech enhancement in which we first train a modified vector-quantized variational autoencoder that solves a source separation task. We then use this trained autoencoder to further train an enhancement network using real-world noisy speech data by computing a triplet-based unsupervised loss function. Experiments show promising results for incorporating real-world data in training speech enhancement systems.
Authors
(none)
Tags
Stats
Related papers
- Analysis Of DNN Speech Signal Enhancement For Robust Speaker Recognition (2018)11.39
- Statistical Speech Enhancement Based On Probabilistic Integration Of Variational Autoencoder And Non-negative Matrix Factorization (2017)15.00
- Semi-supervised Multichannel Speech Enhancement With Variational Autoencoders And Non-negative Matrix Factorization (2018)12.25
- Guided Variational Autoencoder For Speech Enhancement With A Supervised Classifier (2021)8.60
- Unsupervised Feature Enhancement For Speaker Verification (2019)5.84
- Adversarial Feature Learning And Unsupervised Clustering Based Speech Synthesis For Found Data With Acoustic And Textual Noise (2020)7.16
- Unsupervised Speech Enhancement With Speech Recognition Embedding And Disentanglement Losses (2021)8.35
- Investigation Of Speech And Noise Latent Representations In Single-channel Vae-based Speech Enhancement (2025)0.00