Phase-aware Speech Enhancement With Deep Complex U-net
2019 Β· Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, et al.
Abstract
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of clean speech. To improve speech enhancement performance, we tackle the phase estimation problem in three ways. First, we propose Deep Complex U-Net, an advanced U-Net structured model incorporating well-defined complex-valued building blocks to deal with complex-valued spectrograms. Second, we propose a polar coordinate-wise complex-valued masking method to reflect the distribution of complex ideal ratio masks. Third, we define a novel loss function, weighted source-to-distortion ratio (wSDR) loss, which is designed to directly correlate with a quantitative evaluation measure. Our model was evaluated on a mixture of the Voice Bank corpus and DEMAND database, which has been widely used by many deep learning models for speech enhancement. Ablation experim
Authors
(none)
Tags
Stats
Related papers
- Single-channel Speech Enhancement With Deep Complex U-networks And Probabilistic Latent Space Models (2023)5.24
- DCCRN: Deep Complex Convolution Recurrent Network For Phase-aware Speech Enhancement (2020)20.78
- Improved Speech Enhancement With The Wave-u-net (2018)0.00
- Phase Aware Speech Enhancement Using Realisation Of Complex-valued LSTM (2020)0.00
- Explicit Estimation Of Magnitude And Phase Spectra In Parallel For High-quality Speech Enhancement (2023)11.19
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00
- Magnitude-and-phase-aware Speech Enhancement With Parallel Sequence Modeling (2023)3.58
- PHASEN: A Phase-and-harmonics-aware Speech Enhancement Network (2019)18.20