An Efficient Speech Separation Network Based On Recurrent Fusion Dilated Convolution And Channel Attention
2023 Β· Junyu Wang
Abstract
We present an efficient speech separation neural network, ARFDCN, which combines dilated convolutions, multi-scale fusion (MSF), and channel attention to overcome the limited receptive field of convolution-based networks and the high computational cost of transformer-based networks. The suggested network architecture is encoder-decoder based. By using dilated convolutions with gradually increasing dilation value to learn local and global features and fusing them at adjacent stages, the model can learn rich feature content. Meanwhile, by adding channel attention modules to the network, the model can extract channel weights, learn more important features, and thus improve its expressive power and robustness. Experimental results indicate that the model achieves a decent balance between performance and computational efficiency, making it a promising alternative to current mainstream models for practical applications.
Authors
(none)
Tags
Stats
Related papers
- Speech Separation Using An Asynchronous Fully Recurrent Convolutional Neural Network (2021)0.00
- Multi-scale Feature Fusion Transformer Network For End-to-end Single Channel Speech Separation (2022)0.00
- AMFFCN: Attentional Multi-layer Feature Fusion Convolution Network For Audio-visual Speech Enhancement (2021)0.00
- Furcanext: End-to-end Monaural Speech Separation With Dynamic Gated Dilated Temporal Convolutional Networks (2019)12.40
- DCF-DS: Deep Cascade Fusion Of Diarization And Separation For Speech Recognition Under Realistic Single-channel Conditions (2024)3.58
- Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization (2024)3.58
- Attention Is All You Need In Speech Separation (2020)20.59
- Multi-loss Convolutional Network With Time-frequency Attention For Speech Enhancement (2023)0.00