Target Speaker Selection For Neural Network Beamforming In Multi-speaker Scenarios
2025 · Luan Vinícius Fiorio, Bruno Defraene, Johan David, et al.
Abstract
We propose a speaker selection mechanism (SSM) for the training of an end-to-end beamforming neural network, based on recent findings that a listener usually looks to the target speaker with a certain undershot angle. The mechanism allows the neural network model to learn toward which speaker to focus, during training, in a multi-speaker scenario, based on the position of listener and speakers. However, only audio information is necessary during inference. We perform acoustic simulations demonstrating the feasibility and performance when the SSM is employed in training. The results show significant increase in speech intelligibility, quality, and distortion metrics when compared to the minimum variance distortionless filter and the same neural network model trained without SSM. The success of the proposed method is a significant step forward toward the solution of the cocktail party problem.
Authors
(none)
Tags
Stats
Related papers
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- Optimization Of Speaker Extraction Neural Network With Magnitude And Temporal Spectrum Approximation Loss (2019)11.29
- Locate And Beamform: Two-dimensional Locating All-neural Beamformer For Multi-channel Speech Separation (2023)3.58
- Improving Speaker Discrimination Of Target Speech Extraction With Time-domain Speakerbeam (2020)14.76
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Enhanced Neural Beamformer With Spatial Information For Target Speech Extraction (2023)2.26
- 3D Neural Beamforming For Multi-channel Speech Separation Against Location Uncertainty (2023)0.00
- Dual-path Transformer Based Neural Beamformer For Target Speech Extraction (2023)0.00