PHASEN: A Phase-and-harmonics-aware Speech Enhancement Network
2019 Β· Dacheng Yin, Chong Luo, Zhiwei Xiong, et al.
Abstract
Time-frequency (T-F) domain masking is a mainstream approach for single-channel speech enhancement. Recently, focuses have been put to phase prediction in addition to amplitude prediction. In this paper, we propose a phase-and-harmonics-aware deep neural network (DNN), named PHASEN, for this task. Unlike previous methods that directly use a complex ideal ratio mask to supervise the DNN learning, we design a two-stream network, where amplitude stream and phase stream are dedicated to amplitude and phase prediction. We discover that the two streams should communicate with each other, and this is crucial to phase prediction. In addition, we propose frequency transformation blocks to catch long-range correlations along the frequency axis. The visualization shows that the learned transformation matrix spontaneously captures the harmonic correlation, which has been proven to be helpful for T-F spectrogram reconstruction. With these two innovations, PHASEN acquires the ability to handle detai
Authors
(none)
Tags
Stats
Related papers
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00
- Phase Reconstruction From Amplitude Spectrograms Based On Von-mises-distribution Deep Neural Network (2018)11.85
- Explicit Estimation Of Magnitude And Phase Spectra In Parallel For High-quality Speech Enhancement (2023)11.19
- DCCRN: Deep Complex Convolution Recurrent Network For Phase-aware Speech Enhancement (2020)20.78
- Time-graph Frequency Representation With Singular Value Decomposition For Neural Speech Enhancement (2024)2.26
- Phase-aware Speech Enhancement With Deep Complex U-net (2019)0.00
- Magnitude-and-phase-aware Speech Enhancement With Parallel Sequence Modeling (2023)3.58
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00