Deft-an: Dense Frequency-time Attentive Network For Multichannel Speech Enhancement
2022 Β· Dongheon Lee, Jung-Woo Choi
Abstract
In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and inte
Authors
(none)
Tags
Stats
Related papers
- Spatialnet: Extensively Learning Spatial Information For Multichannel Joint Speech Separation, Denoising And Dereverberation (2023)13.88
- Deftan-ii: Efficient Multichannel Speech Enhancement With Subgroup Processing (2023)7.16
- Dense CNN With Self-attention For Time-domain Speech Enhancement (2020)16.59
- Multichannel Speech Enhancement Without Beamforming (2021)9.41
- Decoupled Spatial And Temporal Processing For Resource Efficient Multichannel Speech Enhancement (2024)0.00
- Lmfca-net: A Lightweight Model For Multi-channel Speech Enhancement With Efficient Narrow-band And Cross-band Attention (2025)3.58
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00
- Narrow-band Deep Filtering For Multichannel Speech Enhancement (2019)0.00