A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model
2024 Β· Dongdi Zhao, Jianbo Ma, Lu Lu, et al.
Abstract
Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usuall
Authors
(none)
Tags
Stats
Related papers
- Spatial Attention For Far-field Speech Recognition With Deep Beamforming Neural Networks (2019)0.00
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- Attention-based Neural Beamforming Layers For Multi-channel Speech Recognition (2021)0.00
- End-to-end Far-field Speech Recognition With Unified Dereverberation And Beamforming (2020)10.61
- Locate And Beamform: Two-dimensional Locating All-neural Beamformer For Multi-channel Speech Separation (2023)3.58
- Dual-path Transformer Based Neural Beamformer For Target Speech Extraction (2023)0.00
- Deep Ad-hoc Beamforming (2018)9.59
- Self-attention Channel Combinator Frontend For End-to-end Multichannel Far-field Speech Recognition (2021)7.81