Furcanet: An End-to-end Deep Gated Convolutional, Long Short-term Memory, Deep Neural Networks For Single Channel Speech Separation
2019 Β· Ziqiang Shi, Huibin Lin, Liu Liu, et al.
Abstract
Deep gated convolutional networks have been proved to be very effective in single channel speech separation. However current state-of-the-art framework often considers training the gated convolutional networks in time-frequency (TF) domain. Such an approach will result in limited perceptual score, such as signal-to-distortion ratio (SDR) upper bound of separated utterances and also fail to exploit an end-to-end framework. In this paper we present an integrated simple and effective end-to-end approach to monaural speech separation, which consists of deep gated convolutional neural networks (GCNN) that takes the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. In addition long short-term memory (LSTM) is employed for long term temporal modeling. For the objective, we propose to train the network by directly optimizing utterance level SDR in a permutation invariant training (PIT) style. Our experiments on the
Authors
(none)
Tags
Stats
Related papers
- Furcanext: End-to-end Monaural Speech Separation With Dynamic Gated Dilated Temporal Convolutional Networks (2019)12.40
- End-to-end Networks For Supervised Single-channel Speech Separation (2018)0.00
- Multi-channel Narrow-band Deep Speech Separation With Full-band Permutation Invariant Training (2021)9.41
- End-to-end Multi-channel Speech Separation (2019)0.00
- Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters (2023)10.35
- Lafurca: Iterative Refined Speech Separation Based On Context-aware Dual-path Parallel Bi-lstm (2020)0.00
- Inplace Gated Convolutional Recurrent Neural Network For Dual-channel Speech Enhancement (2021)0.00
- End-to-end Training Of Time Domain Audio Separation And Recognition (2019)10.35