Run-time Adaptation Of Neural Beamforming For Robust Speech Dereverberation And Denoising
2024 Β· Yoto Fujita, Aditya Arie Nugraha, Diego di Carlo, et al.
Abstract
This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior work proposed a dual-process system based on a cascade of WPE and minimum variance distortionless respo
Authors
(none)
Tags
Stats
Related papers
- Dnn-free Low-latency Adaptive Speech Enhancement Based On Frame-online Beamforming Powered By Block-online Fastmnmf (2022)0.00
- Neural Network-augmented Kalman Filtering For Robust Online Speech Dereverberation In Noisy Reverberant Environments (2022)0.00
- WPD++: An Improved Neural Beamformer For Simultaneous Speech Separation And Dereverberation (2020)6.77
- Unsupervised Speech Enhancement Based On Multichannel Nmf-informed Beamforming For Noise-robust Automatic Speech Recognition (2019)13.23
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend (2021)10.97
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- End-to-end Far-field Speech Recognition With Unified Dereverberation And Beamforming (2020)10.61