Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition
2017 Β· Zhong Meng, Shinji Watanabe, John R. Hershey, et al.
Abstract
Far-field speech recognition in noisy and reverberant conditions remains a challenging problem despite recent deep learning breakthroughs. This problem is commonly addressed by acquiring a speech signal from multiple microphones and performing beamforming over them. In this paper, we propose to use a recurrent neural network with long short-term memory (LSTM) architecture to adaptively estimate real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions which results in a set of timevarying room impulse responses. The LSTM adaptive beamformer is jointly trained with a deep LSTM acoustic model to predict senone labels. Further, we use hidden units in the deep LSTM acoustic model to assist in predicting the beamforming filter coefficients. The proposed system achieves 7.97% absolute gain over baseline systems with no beamforming on CHiME-3 real evaluation set.
Authors
(none)
Tags
Stats
Related papers
- Deep Ad-hoc Beamforming (2018)9.59
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Improved MVDR Beamforming Using LSTM Speech Models To Clean Spatial Clustering Masks (2020)0.00
- Speaker Adapted Beamforming For Multi-channel Automatic Speech Recognition (2018)5.84
- Fasnet: Low-latency Adaptive Beamforming For Multi-microphone Audio Processing (2019)0.00
- 3-D Feature And Acoustic Modeling For Far-field Speech Recognition (2019)0.00
- Run-time Adaptation Of Neural Beamforming For Robust Speech Dereverberation And Denoising (2024)0.00
- Short-time Deep-learning Based Source Separation For Speech Enhancement In Reverberant Environments With Beamforming (2020)0.00