Multi-objective Learning And Mask-based Post-processing For Deep Neural Network Based Speech Enhancement
2017 Β· Yong Xu, Jun Du, Zhen Huang, et al.
Abstract
We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based post-processing in this work. A series of experiments show that joint LPS and MFCC learning improves the SE perf
Authors
(none)
Tags
Stats
Related papers
- Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement (2022)0.00
- Magnitude-phase Dual-path Speech Enhancement Network Based On Self-supervised Embedding And Perceptual Contrast Stretch Boosting (2025)3.21
- Reinforcement Learning Based Speech Enhancement For Robust Speech Recognition (2018)11.08
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00
- Aligning Generative Speech Enhancement With Perceptual Feedback (2025)0.00
- Vsanet: Real-time Speech Enhancement Based On Voice Activity Detection And Causal Spatial Attention (2023)5.24
- A Lightweight Dual-stage Framework For Personalized Speech Enhancement Based On Deepfilternet2 (2024)2.26
- Multi-cmgan+/+: Leveraging Multi-objective Speech Quality Metric Prediction For Speech Enhancement (2023)0.00