Unsupervised Speech Enhancement With Speech Recognition Embedding And Disentanglement Losses
2021 Β· Viet Anh Trinh, Sebastian Braun
Abstract
Speech enhancement has recently achieved great success with various deep learning methods. However, most conventional speech enhancement systems are trained with supervised methods that impose two significant challenges. First, a majority of training datasets for speech enhancement systems are synthetic. When mixing clean speech and noisy corpora to create the synthetic datasets, domain mismatches occur between synthetic and real-world recordings of noisy speech or audio. Second, there is a trade-off between increasing speech enhancement performance and degrading speech recognition (ASR) performance. Thus, we propose an unsupervised loss function to tackle those two problems. Our function is developed by extending the MixIT loss function with speech recognition embedding and disentanglement loss. Our results show that the proposed function effectively improves the speech enhancement performance compared to a baseline trained in a supervised way on the noisy VoxCeleb dataset. While full
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Disentangled Representation Learning For Robust Target Speech Extraction (2023)5.24
- Disentangled Speaker And Nuisance Attribute Embedding For Robust Speaker Verification (2020)8.60
- Perceive And Predict: Self-supervised Speech Representation Based Loss Functions For Speech Enhancement (2023)7.16
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21
- A Consolidated View Of Loss Functions For Supervised Deep Learning-based Speech Enhancement (2020)13.93
- Learning Disentangled Speech Representations (2023)0.00
- Contentvec: An Improved Self-supervised Speech Representation By Disentangling Speakers (2022)0.00
- Single-channel Speech Enhancement Using Learnable Loss Mixup (2023)0.00