Multi-talker MVDR Beamforming Based On Extended Complex Gaussian Mixture Model
2019 Β· Hangting Chen, Pengyuan Zhang, Yonghong Yan
Abstract
In this letter, we present a novel multi-talker minimum variance distortionless response (MVDR) beamforming as the front-end of an automatic speech recognition (ASR) system in a dinner party scenario. The CHiME-5 dataset is selected to evaluate our proposal for overlapping multi-talker scenario with severe noise. A detailed study on beamforming is conducted based on the proposed extended complex Gaussian mixture model (CGMM) integrated with various speech separation and speech enhancement masks. Three main changes are made to adopt the original CGMM-based MVDR for the multi-talker scenario. First, the number of Gaussian distributions is extended to 3 with an additional inference speaker model. Second, the mixture coefficients are introduced as a supervisor to generate more elaborate masks and avoid the permutation problems. Moreover, we reorganize the MVDR and mask-based speech separation to achieve both noise reduction and target speaker extraction. With the official baseline ASR back
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23
- A Robust Maximum Likelihood Distortionless Response Beamformer Based On A Complex Generalized Gaussian Distribution (2021)0.00
- Improved MVDR Beamforming Using LSTM Speech Models To Clean Spatial Clustering Masks (2020)0.00
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- Target Speaker Selection For Neural Network Beamforming In Multi-speaker Scenarios (2025)0.00
- Multi-channel Multi-frame ADL-MVDR For Target Speech Separation (2020)0.00
- ADL-MVDR: All Deep Learning MVDR Beamformer For Target Speech Separation (2020)15.00
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34