Consep: A Noise- And Reverberation-robust Speech Separation Framework By Magnitude Conditioning
2024 Β· Kuan-Hsun Ho, Jeih-Weih Hung, Berlin Chen
Abstract
Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to inherit the beneficial characteristics. The experiment shows that ConSep promotes performance in anechoic, noisy, and reverberant settings compared to two celebrated methods, SepFormer and Bi-Sep. Furthermore, we visualize the components of ConSep to strengthen the advantages and cohere with the actualities we have found in preliminary studies.
Authors
(none)
Tags
Stats
Related papers
- Monaural Source Separation: From Anechoic To Reverberant Environments (2021)10.61
- On Time Domain Conformer Models For Monaural Speech Separation In Noisy Reverberant Acoustic Environments (2023)5.84
- Efficient Transformer-based Speech Enhancement Using Long Frames And STFT Magnitudes (2022)9.59
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- Single-microphone Speaker Separation And Voice Activity Detection In Noisy And Reverberant Environments (2024)0.00
- A Multi-stage Triple-path Method For Speech Separation In Noisy And Reverberant Environments (2023)2.26
- DPCCN: Densely-connected Pyramid Complex Convolutional Network For Robust Speech Separation And Extraction (2021)0.00
- Learning-based Robust Speaker Counting And Separation With The Aid Of Spatial Coherence (2023)5.24