Speaker Selective Beamformer With Keyword Mask Estimation
2018 Β· Yusuke Kida, Dung Tran, Motoi Omachi, et al.
Abstract
This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subsequent utterances from the target speaker. Experimental evaluations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mixture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the proposed method for both simulated and real recorded test sets.
Authors
(none)
Tags
Stats
Related papers
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- Target Speaker Selection For Neural Network Beamforming In Multi-speaker Scenarios (2025)0.00
- Voicefilter: Targeted Voice Separation By Speaker-conditioned Spectrogram Masking (2018)17.48
- Mask-weighted Spatial Likelihood Coding For Speaker-independent Joint Localization And Mask Estimation (2024)0.00
- Optimization Of Speaker Extraction Neural Network With Magnitude And Temporal Spectrum Approximation Loss (2019)11.29
- Improving Speaker Discrimination Of Target Speech Extraction With Time-domain Speakerbeam (2020)14.76
- Run-time Adaptation Of Neural Beamforming For Robust Speech Dereverberation And Denoising (2024)0.00
- Heimdal: Highly Efficient Method For Detection And Localization Of Wake-words (2022)3.58