Efficient Encoder-decoder And Dual-path Conformer For Comprehensive Feature Learning In Speech Enhancement
2023 Β· Junyu Wang
Abstract
Current speech enhancement (SE) research has largely neglected channel attention and spatial attention, and encoder-decoder architecture-based networks have not adequately considered how to provide efficient inputs to the intermediate enhancement layer. To address these issues, this paper proposes a time-frequency (T-F) domain SE network (DPCFCS-Net) that incorporates improved densely connected blocks, dual-path modules, convolution-augmented transformers (conformers), channel attention, and spatial attention. Compared with previous models, our proposed model has a more efficient encoder-decoder and can learn comprehensive features. Experimental results on the VCTK+DEMAND dataset demonstrate that our method outperforms existing techniques in SE performance. Furthermore, the improved densely connected block and two dimensions attention module developed in this work are highly adaptable and easily integrated into existing networks.
Authors
(none)
Tags
Stats
Related papers
- Df-conformer: Integrated Architecture Of Conv-tasnet And Conformer Using Linear Complexity Self-attention For Speech Enhancement (2021)11.29
- Forknet: Simultaneous Time And Time-frequency Domain Modeling For Speech Enhancement (2023)0.00
- Time-domain Speech Enhancement Assisted By Multi-resolution Frequency Encoder And Decoder (2023)9.76
- Multichannel Speech Enhancement By Raw Waveform-mapping Using Fully Convolutional Networks (2019)12.25
- Vsanet: Real-time Speech Enhancement Based On Voice Activity Detection And Causal Spatial Attention (2023)5.24
- Uformer: A Unet Based Dilated Complex & Real Dual-path Conformer Network For Simultaneous Speech Enhancement And Dereverberation (2021)12.87
- D2former: A Fully Complex Dual-path Dual-decoder Conformer Network Using Joint Complex Masking And Complex Spectral Mapping For Monaural Speech Enhancement (2023)0.00
- Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement (2022)0.00