Leveraging Redundancy In Multiple Audio Signals For Far-field Speech Recognition
2023 Β· Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, et al.
Abstract
To achieve robust far-field automatic speech recognition (ASR), existing techniques typically employ an acoustic front end (AFE) cascaded with a neural transducer (NT) ASR model. The AFE output, however, could be unreliable, as the beamforming output in AFE is steered to a wrong direction. A promising way to address this issue is to exploit the microphone signals before the beamforming stage and after the acoustic echo cancellation (post-AEC) in AFE. We argue that both, post-AEC and AFE outputs, are complementary and it is possible to leverage the redundancy between these signals to compensate for potential AFE processing errors. We present two fusion networks to explore this redundancy and aggregate these multi-channel (MC) signals: (1) Frequency-LSTM based, and (2) Convolutional Neural Network based fusion networks. We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion. Our experimental results on commercial virtual assistant tasks de
Authors
(none)
Tags
Stats
Related papers
- Frequency Domain Multi-channel Acoustic Modeling For Distant Speech Recognition (2019)9.92
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Multi-geometry Spatial Acoustic Modeling For Distant Speech Recognition (2019)6.34
- Deep Residual Echo Suppression And Noise Reduction: A Multi-input FCRN Approach In A Hybrid Speech Enhancement System (2021)8.09
- A Network Of Deep Neural Networks For Distant Speech Recognition (2017)10.35
- Improved Far-field Speech Recognition Using Joint Variational Autoencoder (2022)0.00
- A Universally-deployable ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement, And Voice Separation (2022)5.84
- Dereverberation Of Autoregressive Envelopes For Far-field Speech Recognition (2021)6.77