Neural Network-based Time-frequency-bin-wise Linear Combination Of Beamformers For Underdetermined Target Source Extraction
2026 Β· Changda Chen, Yichen Yang, Wei Liu, et al.
Abstract
Extracting a target source from underdetermined mixtures is challenging for beamforming approaches. Recently proposed time-frequency-bin-wise switching (TFS) and linear combination (TFLC) strategies mitigate this by combining multiple beamformers in each time-frequency (TF) bin and choosing combination weights that minimize the output power. However, making this decision independently for each TF bin can weaken temporal-spectral coherence, causing discontinuities and consequently degrading extraction performance. In this paper, we propose a novel neural network-based time-frequency-bin-wise linear combination (NN-TFLC) framework that constructs minimum power distortionless response (MPDR) beamformers without explicit noise covariance estimation. The network encodes the mixture and beamformer outputs, and predicts temporally and spectrally coherent linear combination weights via a cross-attention mechanism. On dual-microphone mixtures with multiple interferers, NN-TFLC-MPDR consistently
Authors
(none)
Tags
Stats
Related papers
- Dual-path Transformer Based Neural Beamformer For Target Speech Extraction (2023)0.00
- Multichannel Loss Function For Supervised Speech Source Separation By Mask-based Beamforming (2019)7.50
- Enhanced Neural Beamformer With Spatial Information For Target Speech Extraction (2023)2.26
- Improving Speaker Discrimination Of Target Speech Extraction With Time-domain Speakerbeam (2020)14.76
- Sequential Multi-frame Neural Beamforming For Speech Separation And Enhancement (2019)0.00
- Locate And Beamform: Two-dimensional Locating All-neural Beamformer For Multi-channel Speech Separation (2023)3.58
- Towards Unified All-neural Beamforming For Time And Frequency Domain Speech Separation (2022)11.29
- Fasnet: Low-latency Adaptive Beamforming For Multi-microphone Audio Processing (2019)0.00