Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation
2022 Β· Yinhao Xu, Jian Zhou, Liang Tao, et al.
Abstract
Recently studies on time-domain audio separation networks (TasNets) have made a great stride in speech separation. One of the most representative TasNets is a network with a dual-path segmentation approach. However, the original model called DPRNN used a fixed feature dimension and unchanged segment size throughout all layers of the network. In this paper, we propose a multi-scale feature fusion transformer network (MSFFT-Net) based on the conventional dual-path structure for single-channel speech separation. Unlike the conventional dual-path structure where only one processing path exists, adopting several iterative blocks with alternative intra-chunk and inter-chunk operations to capture local and global context information, the proposed MSFFT-Net has multiple parallel processing paths where the feature information can be exchanged between multiple parallel processing paths. Experiments show that our proposed networks based on multi-scale feature fusion structure have achieved better
Authors
(none)
Tags
Stats
Related papers
- Tasnet: Time-domain Audio Separation Network For Real-time, Single-channel Speech Separation (2017)20.16
- Dual-path Transformer Network: Direct Context-aware Modeling For End-to-end Monaural Speech Separation (2020)18.24
- End-to-end Multi-channel Speech Separation (2019)0.00
- Dasformer: Deep Alternating Spectrogram Transformer For Multi/single-channel Speech Separation (2023)0.00
- Conv-tasnet: Surpassing Ideal Time-frequency Magnitude Masking For Speech Separation (2018)24.08
- An Efficient Speech Separation Network Based On Recurrent Fusion Dilated Convolution And Channel Attention (2023)0.00
- Dual-path Filter Network: Speaker-aware Modeling For Speech Separation (2021)3.58
- Multi-dimensional And Multi-scale Modeling For Speech Separation Optimized By Discriminative Learning (2023)0.00