Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation
2023 Β· Changsheng Quan, Xiaofei Li
Abstract
This work proposes a neural network to extensively exploit spatial information for multichannel joint speech separation, denoising and dereverberation, named SpatialNet. In the short-time Fourier transform (STFT) domain, the proposed network performs end-to-end speech enhancement. It is mainly composed of interleaved narrow-band and cross-band blocks to respectively exploit narrow-band and cross-band spatial information. The narrow-band blocks process frequencies independently, and use self-attention mechanism and temporal convolutional layers to respectively perform spatial-feature-based speaker clustering and temporal smoothing/filtering. The cross-band blocks process frames independently, and use full-band linear layer and frequency convolutional layers to respectively learn the correlation between all frequencies and adjacent frequencies. Experiments are conducted on various simulated and real datasets, and the results show that 1) the proposed network achieves the state-of-the-art
Authors
(none)
Tags
Stats
Related papers
- Crossnet: Leveraging Global, Cross-band, Narrow-band, And Positional Encoding For Single- And Multi-channel Speaker Separation (2024)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Decoupled Spatial And Temporal Processing For Resource Efficient Multichannel Speech Enhancement (2024)0.00
- Multichannel Long-term Streaming Neural Speech Enhancement For Static And Moving Speakers (2024)16.05
- Spatial-dccrn: Dccrn Equipped With Frame-level Angle Feature And Hybrid Filtering For Multi-channel Speech Enhancement (2022)5.84
- Deft-an: Dense Frequency-time Attentive Network For Multichannel Speech Enhancement (2022)12.10
- Inter-channel Conv-tasnet For Multichannel Speech Enhancement (2021)0.00
- Temporal-spatial Neural Filter: Direction Informed End-to-end Multi-channel Target Speech Separation (2020)0.00