Analyzing The Impact Of Speaker Localization Errors On Speech Separation For Automatic Speech Recognition
2019 Β· Sunit Sivasankaran, Emmaneul Vincent, Dominique Fohr
Abstract
We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of \(29.4\)% was achieved using the ground truth localization information and \(42.4\)% using the localization information estimated via GCC-PHAT. The signal-to-interference ratio (SIR) between th
Authors
(none)
Tags
Stats
Related papers
- SLOGD: Speaker Location Guided Deflation Approach To Speech Separation (2019)0.00
- 3D Neural Beamforming For Multi-channel Speech Separation Against Location Uncertainty (2023)0.00
- End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend (2021)10.97
- Locate And Beamform: Two-dimensional Locating All-neural Beamformer For Multi-channel Speech Separation (2023)3.58
- Mask-weighted Spatial Likelihood Coding For Speaker-independent Joint Localization And Mask Estimation (2024)0.00
- Deep Learning Based Multi-source Localization With Source Splitting And Its Effectiveness In Multi-talker Speech Recognition (2021)14.23
- Multi-channel Speaker Verification For Single And Multi-talker Speech (2020)0.00
- Exploring The Integration Of Speech Separation And Recognition With Self-supervised Learning Representation (2023)6.34