DDTSE: Discriminative Diffusion Model For Target Speech Extraction
2023 Β· Leying Zhang, Yao Qian, Linfeng Yu, et al.
Abstract
Diffusion models have gained attention in speech enhancement tasks, providing an alternative to conventional discriminative methods. However, research on target speech extraction under multi-speaker noisy conditions remains relatively unexplored. Moreover, the superior quality of diffusion methods typically comes at the cost of slower inference speed. In this paper, we introduce the Discriminative Diffusion model for Target Speech Extraction (DDTSE). We apply the same forward process as diffusion models and utilize the reconstruction loss similar to discriminative methods. Furthermore, we devise a two-stage training strategy to emulate the inference process during model training. DDTSE not only works as a standalone system, but also can further improve the performance of discriminative models without additional retraining. Experimental results demonstrate that DDTSE not only achieves higher perceptual quality but also accelerates the inference process by 3 times compared to the convent
Authors
(none)
Tags
Stats
Related papers
- Extract And Diffuse: Latent Integration For Improved Diffusion-based Speech And Vocal Enhancement (2024)0.00
- Adversarial Training Of Denoising Diffusion Model Using Dual Discriminators For High-fidelity Multi-speaker TTS (2023)2.26
- Gdiffuse: Diffusion-based Speech Enhancement With Noise Model Guidance (2025)0.00
- Single And Few-step Diffusion For Generative Speech Enhancement (2023)10.21
- Investigating The Design Space Of Diffusion Models For Speech Enhancement (2023)10.07
- Multi-gradspeech: Towards Diffusion-based Multi-speaker Text-to-speech Using Consistent Diffusion Models (2023)0.00
- Diffusion-based Signal Refiner For Speech Enhancement And Separation (2023)2.26
- Noise-aware Speech Enhancement Using Diffusion Probabilistic Model (2023)8.82