Incorporating Multi-target In Multi-stage Speech Enhancement Model For Better Generalization
2021 Β· Lu Zhang, Mingjiang Wang, Andong Li, et al.
Abstract
Recent single-channel speech enhancement methods based on deep neural networks (DNNs) have achieved remarkable results, but there are still generalization problems in real scenes. Like other data-driven methods, DNN-based speech enhancement models produce significant performance degradation on untrained data. In this study, we make full use of the contribution of multi-target joint learning to the model generalization capability, and propose a lightweight and low-computing dilated convolutional network (DCN) model for a more robust speech denoising task. Our goal is to integrate the masking target, the mapping target, and the parameters of the traditional speech enhancement estimator into a DCN model to maximize their complementary advantages. To do this, we build a multi-stage learning framework to deal with multiple targets in stages to achieve their joint learning, namely `MT-in-MS'. Our experimental results show that compared with the state-of-the-art time domain and time-frequency
Authors
(none)
Tags
Stats
Related papers
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Multichannel Speech Enhancement Without Beamforming (2021)9.41
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00
- Multi-modal Hybrid Deep Neural Network For Speech Enhancement (2016)0.00
- Speech Enhancement Using Multi-stage Self-attentive Temporal Convolutional Networks (2021)14.15
- On The Role Of Spatial, Spectral, And Temporal Processing For Dnn-based Non-linear Multi-channel Speech Enhancement (2022)7.81
- Monaural Speech Enhancement Using A Multi-branch Temporal Convolutional Network (2019)3.58