Multi-loss Convolutional Network With Time-frequency Attention For Speech Enhancement
2023 Β· Liang Wan, Hongqing Liu, Yi Zhou, et al.
Abstract
The Dual-Path Convolution Recurrent Network (DPCRN) was proposed to effectively exploit time-frequency domain information. By combining the DPRNN module with Convolution Recurrent Network (CRN), the DPCRN obtained a promising performance in speech separation with a limited model size. In this paper, we explore self-attention in the DPCRN module and design a model called Multi-Loss Convolutional Network with Time-Frequency Attention(MNTFA) for speech enhancement. We use self-attention modules to exploit the long-time information, where the intra-chunk self-attentions are used to model the spectrum pattern and the inter-chunk self-attention are used to model the dependence between consecutive frames. Compared to DPRNN, axial self-attention greatly reduces the need for memory and computation, which is more suitable for long sequences of speech signals. In addition, we propose a joint training method of a multi-resolution STFT loss and a WavLM loss using a pre-trained WavLM network. Experi
Authors
(none)
Tags
Stats
Related papers
- DPCRN: Dual-path Convolution Recurrent Network For Single Channel Speech Enhancement (2021)14.35
- Dense CNN With Self-attention For Time-domain Speech Enhancement (2020)16.59
- Monaural Speech Enhancement Using A Multi-branch Temporal Convolutional Network (2019)3.58
- PDPCRN: Parallel Dual-path CRN With Bi-directional Inter-branch Interactions For Multi-channel Speech Enhancement (2023)0.00
- Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network For Speech Enhancement (2021)0.00
- DCCRN: Deep Complex Convolution Recurrent Network For Phase-aware Speech Enhancement (2020)20.78
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)21.06
- An Efficient Speech Separation Network Based On Recurrent Fusion Dilated Convolution And Channel Attention (2023)0.00