Deep Learning Based Phase Reconstruction For Speaker Separation: A Trigonometric Perspective
2018 Β· Zhong-Qiu Wang, Ke Tan, Deliang Wang
Abstract
This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture of two sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in addition, the source phases at each time-frequency (T-F) unit can be narrowed down to only two candidates. To pick the right candidate, we propose three algorithms based on iterative phase reconstruction, group delay estimation, and phase-difference sign prediction. State-of-the-art results are obtained on the publicly available wsj0-2mix and 3mix corpus.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Speech Separation With Unfolded Iterative Phase Reconstruction (2018)15.00
- Mask-dependent Phase Estimation For Monaural Speaker Separation (2019)6.34
- Deep Attractor Network For Single-microphone Speaker Separation (2016)17.88
- Phase Recovery With Bregman Divergences For Audio Source Separation (2020)0.00
- Discriminative Learning For Monaural Speech Separation Using Deep Embedding Features (2019)8.60
- Speaker-independent Speech Separation With Deep Attractor Network (2017)16.84
- Deep Griffin-lim Iteration (2019)0.00
- Two-stage Model And Optimal SI-SNR For Monaural Multi-speaker Speech Separation In Noisy Environment (2020)0.00