Deftan-ii: Efficient Multichannel Speech Enhancement With Subgroup Processing
2023 Β· Dongheon Lee, Jung-Woo Choi
Abstract
In this work, we present DeFTAN-II, an efficient multichannel speech enhancement model based on transformer architecture and subgroup processing. Despite the success of transformers in speech enhancement, they face challenges in capturing local relations, reducing the high computational complexity, and lowering memory usage. To address these limitations, we introduce subgroup processing in our model, combining subgroups of locally emphasized features with other subgroups containing original features. The subgroup processing is implemented in several blocks of the proposed network. In the proposed split dense blocks extracting spatial features, a pair of subgroups is sequentially concatenated and processed by convolution layers to effectively reduce the computational complexity and memory usage. For the F- and T-transformers extracting temporal and spectral relations, we introduce cross-attention between subgroups to identify relationships between locally emphasized and non-emphasized f
Authors
(none)
Tags
Stats
Related papers
- Deft-an: Dense Frequency-time Attentive Network For Multichannel Speech Enhancement (2022)12.10
- Study Of Lightweight Transformer Architectures For Single-channel Speech Enhancement (2025)3.58
- Dpt-fsnet: Dual-path Transformer Based Full-band And Sub-band Fusion Network For Speech Enhancement (2021)0.00
- Dual-branch Attention-in-attention Transformer For Single-channel Speech Enhancement (2021)14.83
- Decoupled Spatial And Temporal Processing For Resource Efficient Multichannel Speech Enhancement (2024)0.00
- Efficient Encoder-decoder And Dual-path Conformer For Comprehensive Feature Learning In Speech Enhancement (2023)7.16
- TSTNN: Two-stage Transformer Based Neural Network For Speech Enhancement In The Time Domain (2021)16.73
- Speech Enhancement With Perceptually-motivated Optimization And Dual Transformations (2022)0.00