End-to-end Multi-channel Speech Separation
2019 Β· Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, et al.
Abstract
The end-to-end approach for single-channel speech separation has been studied recently and shown promising results. This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation. The primary contributions of this work include 1) an integrated waveform-in waveform-out separation system in a single neural network architecture. 2) We reformulate the traditional short time Fourier transform (STFT) and inter-channel phase difference (IPD) as a function of time-domain convolution with a special kernel. 3) We further relaxed those fixed kernels to be learnable, so that the entire architecture becomes purely data-driven and can be trained from end-to-end. We demonstrate on the WSJ0 far-field speech separation task that, with the benefit of learnable spatial features, our proposed end-to-end multi-channel model significantly improved the performance of previous end-to-end single-channel method and traditional multi-channel methods.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Enhancing End-to-end Multi-channel Speech Separation Via Spatial Feature Learning (2020)12.47
- Multi-channel Narrow-band Deep Speech Separation With Full-band Permutation Invariant Training (2021)9.41
- Efficient Integration Of Multi-channel Information For Speaker-independent Speech Separation (2020)0.00
- End-to-end Speech Separation With Unfolded Iterative Phase Reconstruction (2018)15.00
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Temporal-spatial Neural Filter: Direction Informed End-to-end Multi-channel Target Speech Separation (2020)0.00