Relunet: Relative Channel Fusion U-net For Multichannel Speech Enhancement
2024 Β· Ibrahim Aldarmaki, Thamar Solorio, Bhiksha Raj, et al.
Abstract
Neural multi-channel speech enhancement models, in particular those based on the U-Net architecture, demonstrate promising performance and generalization potential. These models typically encode input channels independently, and integrate the channels during later stages of the network. In this paper, we propose a novel modification of these models by incorporating relative information from the outset, where each channel is processed in conjunction with a reference channel through stacking. This input strategy exploits comparative differences to adaptively fuse information between channels, thereby capturing crucial spatial information and enhancing the overall performance. The experiments conducted on the CHiME-3 dataset demonstrate improvements in speech enhancement metrics across various architectures.
Authors
(none)
Tags
Stats
Related papers
- Dilated U-net Based Approach For Multichannel Speech Enhancement From First-order Ambisonics Recordings (2020)0.00
- Single-channel Speech Enhancement With Deep Complex U-networks And Probabilistic Latent Space Models (2023)5.24
- UL-UNAS: Ultra-lightweight U-nets For Real-time Speech Enhancement Via Network Architecture Search (2025)10.26
- Using Recurrences In Time And Frequency Within U-net Architecture For Speech Enhancement (2018)8.35
- Real-time Streaming Wave-u-net With Temporal Convolutions For Multichannel Speech Enhancement (2021)0.00
- U-former: Improving Monaural Speech Enhancement With Multi-head Self And Cross Attention (2022)0.00
- Thlnet: Two-stage Heterogeneous Lightweight Network For Monaural Speech Enhancement (2023)0.00
- Lmfca-net: A Lightweight Model For Multi-channel Speech Enhancement With Efficient Narrow-band And Cross-band Attention (2025)3.58