Tf-gridnet: Integrating Full- And Sub-band Modeling For Speech Separation
2022 Β· Zhong-Qiu Wang, Samuele Cornell, Shukjae Choi, et al.
Abstract
We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard dataset for two-speaker separation. To show its robustness to noise and reverberation, we evaluate it on monaural reverberant speaker separation using the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!, and obtain state-of-the-art performance on
Authors
(none)
Tags
Stats
Related papers
- Combining Tf-gridnet And Mixture Encoder For Continuous Speech Separation For Meeting Transcription (2023)0.00
- Exploring The Integration Of Speech Separation And Recognition With Self-supervised Learning Representation (2023)6.34
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Rtfs-net: Recurrent Time-frequency Modelling For Efficient Audio-visual Speech Separation (2023)0.00
- Crossnet: Leveraging Global, Cross-band, Narrow-band, And Positional Encoding For Single- And Multi-channel Speaker Separation (2024)0.00
- Sandglasset: A Light Multi-granularity Self-attentive Network For Time-domain Speech Separation (2021)11.93
- Dmf-net: A Decoupling-style Multi-band Fusion Model For Full-band Speech Enhancement (2022)7.16
- Unifying Speech Enhancement And Separation With Gradient Modulation For End-to-end Noise-robust Speech Separation (2023)0.00