Attention-based Neural Beamforming Layers For Multi-channel Speech Recognition
2021 Β· Bhargav Pulugundla, Yang Gao, Brian King, et al.
Abstract
Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input channels. The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers. We train and evaluate on an in-house multi-channel dataset. The results show a relative improvement of 3.8% in WER by the proposed model over the baseline neural beamformer.
Authors
(none)
Tags
Stats
Related papers
- Spatial Attention For Far-field Speech Recognition With Deep Beamforming Neural Networks (2019)0.00
- Locate And Beamform: Two-dimensional Locating All-neural Beamformer For Multi-channel Speech Separation (2023)3.58
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Dual-path Transformer Based Neural Beamformer For Target Speech Extraction (2023)0.00
- Embedding And Beamforming: All-neural Causal Beamformer For Multichannel Speech Enhancement (2021)13.05
- Enhanced Neural Beamformer With Spatial Information For Target Speech Extraction (2023)2.26
- Multichannel Speech Enhancement Without Beamforming (2021)9.41
- Multi-channel End-to-end Neural Network For Speech Enhancement, Source Localization, And Voice Activity Detection (2022)0.00