Automatic Channel Selection And Spatial Feature Integration For Multi-channel Speech Recognition Across Various Array Topologies
2023 Β· Bingshen Mu, Pengcheng Guo, Dake Guo, et al.
Abstract
Automatic Speech Recognition (ASR) has shown remarkable progress, yet it still faces challenges in real-world distant scenarios across various array topologies each with multiple recording devices. The focal point of the CHiME-7 Distant ASR task is to devise a unified system capable of generalizing various array topologies that have multiple recording devices and offering reliable recognition performance in real-world environments. Addressing this task, we introduce an ASR system that demonstrates exceptional performance across various array topologies. First of all, we propose two attention-based automatic channel selection modules to select the most advantageous subset of multi-channel signals from multiple recording devices for each utterance. Furthermore, we introduce inter-channel spatial features to augment the effectiveness of multi-frame cross-channel attention, aiding it in improving the capability of spatial information awareness. Finally, we propose a multi-layer convolution
Authors
(none)
Tags
Stats
Related papers
- The Chime-7 DASR Challenge: Distant Meeting Transcription With Multiple Devices In Diverse Scenarios (2023)12.25
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Stream Attention-based Multi-array End-to-end Speech Recognition (2018)0.00
- 3-D Feature And Acoustic Modeling For Far-field Speech Recognition (2019)0.00
- Mfcca:multi-frame Cross-channel Attention For Multi-speaker ASR In Multi-party Meeting Scenario (2022)7.81
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Exploiting Single-channel Speech For Multi-channel End-to-end Speech Recognition (2021)0.00