Ctpulse: Close-talk, And Pseudo-label Based Far-field, Speech Enhancement
2024 Β· Zhong-Qiu Wang
Abstract
The current dominant approach for neural speech enhancement is via purely-supervised deep learning on simulated pairs of far-field noisy-reverberant speech (i.e., mixtures) and clean speech. The trained models, however, often exhibit limited generalizability to real-recorded mixtures. To deal with this, this paper investigates training enhancement models directly on real mixtures. However, a major difficulty challenging this approach is that, since the clean speech of real mixtures is unavailable, there lacks a good supervision for real mixtures. In this context, assuming that a training set consisting of real-recorded pairs of close-talk and far-field mixtures is available, we propose to address this difficulty via close-talk speech enhancement, where an enhancement model is first trained on simulated mixtures to enhance real-recorded close-talk mixtures and the estimated close-talk speech can then be utilized as a supervision (i.e., pseudo-label) for training far-field speech enhance
Authors
(none)
Tags
Stats
Related papers
- Superm2m: Supervised And Mixture-to-mixture Co-learning For Speech Enhancement And Noise-robust ASR (2024)5.24
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Closing The Gap Between Time-domain Multi-channel Speech Enhancement On Real And Simulation Conditions (2021)8.82
- Single-channel Speech Enhancement Using Learnable Loss Mixup (2023)0.00
- Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform (2021)0.00
- Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement (2022)0.00
- Dynamic Acoustic Compensation And Adaptive Focal Training For Personalized Speech Enhancement (2022)4.52
- Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild (2020)0.00