Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition
2019 Β· Kenichi Kumatani, Minhua Wu, Shiva Sundaram, et al.
Abstract
The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives. In this work, we propose to unify an acoustic model framework by optimizing spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input. Our acoustic model subsumes beamformers with multiple types of array geometry. In contrast to deep clustering methods that treat a neural network as a black box tool, the network encoding the spatial filters can process streaming audio data in real time without the accumulation of target signal statistics. We demonstrate the effectiveness of such MC neural networks thro
Authors
(none)
Tags
Stats
Related papers
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Automatic Channel Selection And Spatial Feature Integration For Multi-channel Speech Recognition Across Various Array Topologies (2023)8.09
- One Model To Enhance Them All: Array Geometry Agnostic Multi-channel Personalized Speech Enhancement (2021)0.00
- Neural Directed Speech Enhancement With Dual Microphone Array In High Noise Scenario (2024)0.00
- 3-D Feature And Acoustic Modeling For Far-field Speech Recognition (2019)0.00
- Leveraging Redundancy In Multiple Audio Signals For Far-field Speech Recognition (2023)0.00
- Hierarchical Modeling Of Spatial Cues Via Spherical Harmonics For Multi-channel Speech Enhancement (2023)0.00
- End-to-end Multi-channel Speaker Extraction And Binaural Speech Synthesis (2024)0.00