Enhancing End-to-end Multi-channel Speech Separation Via Spatial Feature Learning
2020 Β· Rongzhi Gu, Shi-Xiong Zhang, Lianwu Chen, et al.
Abstract
Hand-crafted spatial features (e.g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods. However, these manually designed spatial features are hard to incorporate into the end-to-end optimized MCSS framework. In this work, we propose an integrated architecture for learning spatial features directly from the multi-channel speech waveforms within an end-to-end speech separation framework. In this architecture, time-domain filters spanning signal channels are trained to perform adaptive spatial filtering. These filters are implemented by a 2d convolution (conv2d) layer and their parameters are optimized using a speech separation objective function in a purely data-driven fashion. Furthermore, inspired by the IPD formulation, we design a conv2d kernel to compute the inter-channel convolution differences (ICDs), which are expected to provide the spatial cues that help to distinguish the directional source
Authors
(none)
Tags
Stats
Related papers
- End-to-end Multi-channel Speech Separation (2019)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Improving Dual-microphone Speech Enhancement By Learning Cross-channel Features With Multi-head Attention (2022)6.77
- Spatial And Spectral Deep Attention Fusion For Multi-channel Speech Separation Using Deep Embedding Features (2020)0.00
- Temporal-spatial Neural Filter: Direction Informed End-to-end Multi-channel Target Speech Separation (2020)0.00
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00
- Inter-channel Conv-tasnet For Multichannel Speech Enhancement (2021)0.00