Self-supervised Learning Based Monaural Speech Enhancement With Multi-task Pre-training
2021 Β· Yi Li, Yang Sun, Syed Mohsen Naqvi
Abstract
In self-supervised learning, it is challenging to reduce the gap between the enhancement performance on the estimated and target speech signals with existed pre-tasks. In this paper, we propose a multi-task pre-training method to improve the speech enhancement performance with self-supervised learning. Within the pre-training autoencoder (PAE), only a limited set of clean speech signals are required to learn their latent representations. Meanwhile, to solve the limitation of single pre-task, the proposed masking module exploits the dereverberated mask and estimated ratio mask to denoise the mixture as the second pre-task. Different from the PAE, where the target speech signals are estimated, the downstream task autoencoder (DAE) utilizes a large number of unlabeled and unseen reverberant mixtures to generate the estimated mixtures. The trained DAE is shared by the learned representations and masks. Experimental results on a benchmark dataset demonstrate that the proposed method outperf
Authors
(none)
Tags
Stats
Related papers
- Feature Learning And Ensemble Pre-tasks Based Self-supervised Speech Denoising And Dereverberation (2022)0.00
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Pretext Tasks Selection For Multitask Self-supervised Speech Representation Learning (2021)8.60
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- Weakly-supervised Speech Pre-training: A Case Study On Target Speech Recognition (2023)8.09
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00
- Speech Enhancement Aided End-to-end Multi-task Learning For Voice Activity Detection (2020)11.49