Overlapped Speech Recognition From A Jointly Learned Multi-channel Neural Speech Extraction And Representation
2019 Β· Bo Wu, Meng Yu, Lianwu Chen, et al.
Abstract
We propose an end-to-end joint optimization framework of a multi-channel neural speech extraction and deep acoustic model without mel-filterbank (FBANK) extraction for overlapped speech recognition. First, based on a multi-channel convolutional TasNet with STFT kernel, we unify the multi-channel target speech enhancement front-end network and a convolutional, long short-term memory and fully connected deep neural network (CLDNN) based acoustic model (AM) with the FBANK extraction layer to build a hybrid neural network, which is thus jointly updated only by the recognition loss. The proposed framework achieves 28% word error rate reduction (WERR) over a separately optimized system on AISHELL-1 and shows consistent robustness to signal to interference ratio (SIR) and angle difference between overlapping speakers. Next, a further exploration shows that the speech recognition is improved with a simplified structure by replacing the FBANK extraction layer in the joint model with a learnable
Authors
(none)
Tags
Stats
Related papers
- Distortionless Multi-channel Target Speech Enhancement For Overlapped Speech Recognition (2020)0.00
- Progressive Joint Modeling In Unsupervised Single-channel Overlapped Speech Recognition (2017)11.67
- End-to-end Multi-speaker Speech Recognition Using Speaker Embeddings And Transfer Learning (2019)9.41
- A Unified Multichannel Far-field Speech Recognition System: Combining Neural Beamforming With Attention Based End-to-end Model (2024)0.00
- Incorporating Multi-target In Multi-stage Speech Enhancement Model For Better Generalization (2021)0.00
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- A Conformer-based ASR Frontend For Joint Acoustic Echo Cancellation, Speech Enhancement And Speech Separation (2021)9.23
- Multi-channel Target Speech Extraction With Channel Decorrelation And Target Speaker Adaptation (2020)0.00