Dense CNN With Self-attention For Time-domain Speech Enhancement
2020 Β· Ashutosh Pandey, Deliang Wang
Abstract
Speech enhancement in the time domain is becoming increasingly popular in recent years, due to its capability to jointly enhance both the magnitude and the phase of speech. In this work, we propose a dense convolutional network (DCN) with self-attention for speech enhancement in the time domain. DCN is an encoder and decoder based architecture with skip connections. Each layer in the encoder and the decoder comprises a dense block and an attention module. Dense blocks and attention modules help in feature extraction using a combination of feature reuse, increased network depth, and maximum context aggregation. Furthermore, we reveal previously unknown problems with a loss based on the spectral magnitude of enhanced speech. To alleviate these problems, we propose a novel loss based on magnitudes of enhanced speech and a predicted noise. Even though the proposed loss is based on magnitudes only, a constraint imposed by noise prediction ensures that the loss enhances both magnitude and ph
Authors
(none)
Tags
Stats
Related papers
- Multi-loss Convolutional Network With Time-frequency Attention For Speech Enhancement (2023)0.00
- Deft-an: Dense Frequency-time Attentive Network For Multichannel Speech Enhancement (2022)12.10
- Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform (2021)0.00
- Efficient Encoder-decoder And Dual-path Conformer For Comprehensive Feature Learning In Speech Enhancement (2023)7.16
- Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network For Speech Enhancement (2021)0.00
- DCCRN: Deep Complex Convolution Recurrent Network For Phase-aware Speech Enhancement (2020)20.78
- Dense-tsnet: Dense Connected Two-stage Structure For Ultra-lightweight Speech Enhancement (2024)0.00
- Concatenated Identical DNN (CI-DNN) To Reduce Noise-type Dependence In Dnn-based Speech Enhancement (2018)5.24