Magnitude-phase Dual-path Speech Enhancement Network Based On Self-supervised Embedding And Perceptual Contrast Stretch Boosting
2025 Β· Alimjan Mattursun, Liejun Wang, Yinfeng Yu, et al.
Abstract
Speech self-supervised learning (SSL) has made great progress in various speech processing tasks, but there is still room for improvement in speech enhancement (SE). This paper presents BSP-MPNet, a dual-path framework that combines self-supervised features with magnitude-phase information for SE. The approach starts by applying the perceptual contrast stretching (PCS) algorithm to enhance the magnitude-phase spectrum. A magnitude-phase 2D coarse (MP-2DC) encoder then extracts coarse features from the enhanced spectrum. Next, a feature-separating self-supervised learning (FS-SSL) model generates self-supervised embeddings for the magnitude and phase components separately. These embeddings are fused to create cross-domain feature representations. Finally, two parallel RNN-enhanced multi-attention (REMA) mask decoders refine the features, apply them to the mask, and reconstruct the speech signal. We evaluate BSP-MPNet on the VoiceBank+DEMAND and WHAMR! datasets. Experimental results show
Authors
(none)
Tags
Stats
Related papers
- Mp-senet: A Speech Enhancement Model With Parallel Denoising Of Magnitude And Phase Spectra (2023)15.51
- Explicit Estimation Of Magnitude And Phase Spectra In Parallel For High-quality Speech Enhancement (2023)11.19
- Magnitude-and-phase-aware Speech Enhancement With Parallel Sequence Modeling (2023)3.58
- BSS-CFFMA: Cross-domain Feature Fusion And Multi-attention Speech Enhancement Network Based On Self-supervised Embedding (2024)4.52
- Exploiting Consistency-preserving Loss And Perceptual Contrast Stretching To Boost Ssl-based Speech Enhancement (2024)6.77
- Espnet-se++: Speech Enhancement For Robust Speech Recognition, Translation, And Understanding (2022)18.72
- Investigating Self-supervised Learning For Speech Enhancement And Separation (2022)13.44
- Multi-objective Learning And Mask-based Post-processing For Deep Neural Network Based Speech Enhancement (2017)9.76