Improved MVDR Beamforming Using LSTM Speech Models To Clean Spatial Clustering Masks
2020 Β· Zhaoheng Ni, Felix Grezes, Viet Anh Trinh, et al.
Abstract
Spatial clustering techniques can achieve significant multi-channel noise reduction across relatively arbitrary microphone configurations, but have difficulty incorporating a detailed speech/noise model. In contrast, LSTM neural networks have successfully been trained to recognize speech from noise on single-channel inputs, but have difficulty taking full advantage of the information in multi-channel recordings. This paper integrates these two approaches, training LSTM speech models to clean the masks generated by the Model-based EM Source Separation and Localization (MESSL) spatial clustering method. By doing so, it attains both the spatial separation performance and generality of multi-channel spatial clustering and the signal modeling performance of multiple parallel single-channel LSTM speech enhancers. Our experiments show that when our system is applied to the CHiME-3 dataset of noisy tablet recordings, it increases speech quality as measured by the Perceptual Evaluation of Speec
Authors
(none)
Tags
Stats
Related papers
- Combining Spatial Clustering With LSTM Speech Models For Multichannel Speech Enhancement (2020)0.00
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23
- ADL-MVDR: All Deep Learning MVDR Beamformer For Target Speech Separation (2020)15.00
- Multi-talker MVDR Beamforming Based On Extended Complex Gaussian Mixture Model (2019)0.00
- Multichannel Loss Function For Supervised Speech Source Separation By Mask-based Beamforming (2019)7.50
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- Student-teacher Learning For BLSTM Mask-based Speech Enhancement (2018)9.59