Self-supervised Learning With Diffusion-based Multichannel Speech Enhancement For Speaker Verification Under Noisy Conditions
2023 Β· Sandipana Dowerah, Ajinkya Kulkarni, Romain Serizel, et al.
Abstract
The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step training procedure that takes the benefit of self-supervised learning. In the first stage, the Diff-Filter is trained by conducting timedomain speech filtering using a scoring-based diffusion model. In the second stage, the Diff-Filter is jointly optimized with a pre-trained ECAPA-TDNN speaker verification model under a self-supervised learning framework. We present a novel loss based on equal error rate. This loss is used to conduct selfsupervised learning on a dataset that is not labelled in terms of speakers. The proposed approach is evaluated on MultiSV, a multichannel speaker verification dataset, and shows significant improvements in performance under noisy multichannel conditions.
Authors
(none)
Tags
Stats
Related papers
- Diff-sv: A Unified Hierarchical Framework For Noise-robust Speaker Verification Using Score-based Diffusion Probabilistic Models (2023)6.34
- Diffusion-based Adversarial Purification For Speaker Verification (2023)6.34
- How To Leverage Dnn-based Speech Enhancement For Multi-channel Speaker Verification? (2022)0.00
- Cold Diffusion For Speech Enhancement (2022)11.85
- Unsupervised Feature Enhancement For Speaker Verification (2019)5.84
- Feature Enhancement With Deep Feature Losses For Speaker Verification (2019)10.61
- Voiceextender: Short-utterance Text-independent Speaker Verification With Guided Diffusion Model (2023)4.52
- Diffusion-based Unsupervised Audio-visual Speech Enhancement (2024)4.52