Crossnet: Leveraging Global, Cross-band, Narrow-band, And Positional Encoding For Single- And Multi-channel Speaker Separation
2024 Β· Vahid Ahmadi Kalkhorani, Deliang Wang
Abstract
We introduce CrossNet, a complex spectral mapping approach to speaker separation and enhancement in reverberant and noisy conditions. The proposed architecture comprises an encoder layer, a global multi-head self-attention module, a cross-band module, a narrow-band module, and an output layer. CrossNet captures global, cross-band, and narrow-band correlations in the time-frequency domain. To address performance degradation in long utterances, we introduce a random chunk positional encoding. Experimental results on multiple datasets demonstrate the effectiveness and robustness of CrossNet, achieving state-of-the-art performance in tasks including reverberant and noisy-reverberant speaker separation. Furthermore, CrossNet exhibits faster and more stable training in comparison to recent baselines. Additionally, CrossNet's high performance extends to multi-microphone conditions, demonstrating its versatility in various acoustic scenarios.
Authors
(none)
Tags
Stats
Related papers
- X-crossnet: A Complex Spectral Mapping Approach To Target Speaker Extraction With Cross Attention Speaker Embedding Fusion (2024)0.00
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Desnet: A Multi-channel Network For Simultaneous Speech Dereverberation, Enhancement And Separation (2020)9.59
- Tf-gridnet: Integrating Full- And Sub-band Modeling For Speech Separation (2022)0.00
- Audio-visual Speech Separation And Dereverberation With A Two-stage Multimodal Network (2019)12.47
- Towards Decoupling Frontend Enhancement And Backend Recognition In Monaural Robust ASR (2024)4.52
- Looking Into Your Speech: Learning Cross-modal Affinity For Audio-visual Speech Separation (2021)11.67