Dual-path Self-attention RNN For Real-time Speech Enhancement
2020 Β· Ashutosh Pandey, Deliang Wang
Abstract
We propose a dual-path self-attention recurrent neural network (DP-SARNN) for time-domain speech enhancement. We improve dual-path RNN (DP-RNN) by augmenting inter-chunk and intra-chunk RNN with a recently proposed efficient attention mechanism. The combination of inter-chunk and intra-chunk attention improves the attention mechanism for long sequences of speech frames. DP-SARNN outperforms a baseline DP-RNN by using a frame shift four times larger than in DP-RNN, which leads to a substantially reduced computation time per utterance. As a result, we develop a real-time DP-SARNN by using long short-term memory (LSTM) RNN and causal attention in inter-chunk SARNN. DP-SARNN significantly outperforms existing approaches to speech enhancement, and on average takes 7.9 ms CPU time to process a signal chunk of 32 ms.
Authors
(none)
Tags
Stats
Related papers
- TPARN: Triple-path Attentive Recurrent Network For Time-domain Multichannel Speech Enhancement (2021)12.02
- DPCRN: Dual-path Convolution Recurrent Network For Single Channel Speech Enhancement (2021)14.35
- Inference Skipping For More Efficient Real-time Speech Enhancement With Parallel Rnns (2022)10.35
- Multi-loss Convolutional Network With Time-frequency Attention For Speech Enhancement (2023)0.00
- Dual-path RNN: Efficient Long Sequence Modeling For Time-domain Single-channel Speech Separation (2019)21.06
- PDPCRN: Parallel Dual-path CRN With Bi-directional Inter-branch Interactions For Multi-channel Speech Enhancement (2023)0.00
- Dual-path Cross-modal Attention For Better Audio-visual Speech Extraction (2022)0.00
- Full Attention Bidirectional Deep Learning Structure For Single Channel Speech Enhancement (2021)0.00