3-D Feature And Acoustic Modeling For Far-field Speech Recognition
2019 Β· Anurenjan Purushothaman, Anirudh Sreeram, Sriram Ganapathy
Abstract
Automatic speech recognition in multi-channel reverberant conditions is a challenging task. The conventional way of suppressing the reverberation artifacts involves a beamforming based enhancement of the multi-channel speech signal, which is used to extract spectrogram based features for a neural network acoustic model. In this paper, we propose to extract features directly from the multi-channel speech signal using a multi variate autoregressive (MAR) modeling approach, where the correlations among all the three dimensions of time, frequency and channel are exploited. The MAR features are fed to a convolutional neural network (CNN) architecture which performs the joint acoustic modeling on the three dimensions. The 3-D CNN architecture allows the combination of multi-channel features that optimize the speech recognition cost compared to the traditional beamforming models that focus on the enhancement task. Experiments are conducted on the CHiME-3 and REVERB Challenge dataset using mul
Authors
(none)
Tags
Stats
Related papers
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- Automatic Channel Selection And Spatial Feature Integration For Multi-channel Speech Recognition Across Various Array Topologies (2023)8.09
- Deep Long Short-term Memory Adaptive Beamforming Networks For Multichannel Robust Speech Recognition (2017)13.23
- RIR-SF: Room Impulse Response Based Spatial Feature For Target Speech Recognition In Multi-channel Multi-speaker Scenarios (2023)0.00
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- On Combining Features For Single-channel Robust Speech Recognition In Reverberant Environments (2019)0.00
- Ensemble Of Jointly Trained Deep Neural Network-based Acoustic Models For Reverberant Speech Recognition (2016)0.00
- 3D Neural Beamforming For Multi-channel Speech Separation Against Location Uncertainty (2023)0.00