Superm2m: Supervised And Mixture-to-mixture Co-learning For Speech Enhancement And Noise-robust ASR
2024 Β· Zhong-Qiu Wang
Abstract
The current dominant approach for neural speech enhancement is based on supervised learning by using simulated training data. The trained models, however, often exhibit limited generalizability to real-recorded data. To address this, this paper investigates training enhancement models directly on real target-domain data. We propose to adapt mixture-to-mixture (M2M) training, originally designed for speaker separation, for speech enhancement, by modeling multi-source noise signals as a single, combined source. In addition, we propose a co-learning algorithm that improves M2M with the help of supervised algorithms. When paired close-talk and far-field mixtures are available for training, M2M realizes speech enhancement by training a deep neural network (DNN) to produce speech and noise estimates in a way such that they can be linearly filtered to reconstruct the close-talk and far-field mixtures. This way, the DNN can be trained directly on real mixtures, and can leverage close-talk and
Authors
(none)
Tags
Stats
Related papers
- Ctpulse: Close-talk, And Pseudo-label Based Far-field, Speech Enhancement (2024)0.00
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Boosting Noise Robustness Of Acoustic Model Via Deep Adversarial Training (2018)9.23
- Robust Data2vec: Noise-robust Speech Representation Learning For ASR By Combining Regression And Improved Contrastive Learning (2022)9.76
- Masked Modeling Duo For Speech: Specializing General-purpose Audio Representation To Speech Using Denoising Distillation (2023)7.94
- On Monoaural Speech Enhancement For Automatic Recognition Of Real Noisy Speech Using Mixture Invariant Training (2022)4.52
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23