End-to-end Dereverberation, Beamforming, And Speech Recognition With Improved Numerical Stability And Advanced Frontend
2021 Β· Wangyou Zhang, Christoph Boeddeker, Shinji Watanabe, et al.
Abstract
Recently, the end-to-end approach has been successfully applied to multi-speaker speech separation and recognition in both single-channel and multichannel conditions. However, severe performance degradation is still observed in the reverberant and noisy scenarios, and there is still a large performance gap between anechoic and reverberant conditions. In this work, we focus on the multichannel multi-speaker reverberant condition, and propose to extend our previous framework for end-to-end dereverberation, beamforming, and speech recognition with improved numerical stability and advanced frontend subnetworks including voice activity detection like masks. The techniques significantly stabilize the end-to-end training process. The experiments on the spatialized wsj1-2mix corpus show that the proposed system achieves about 35% WER relative reduction compared to our conventional multi-channel E2E ASR system, and also obtains decent speech dereverberation and separation performance (SDR=12.5
Authors
(none)
Tags
Stats
Related papers
- End-to-end Far-field Speech Recognition With Unified Dereverberation And Beamforming (2020)10.61
- An Investigation Of End-to-end Multichannel Speech Recognition For Reverberant And Mismatch Conditions (2019)0.00
- End-to-end Integration Of Speech Recognition, Dereverberation, Beamforming, And Self-supervised Learning Representation (2022)8.60
- Multi-channel Target Speech Extraction With Channel Decorrelation And Target Speaker Adaptation (2020)0.00
- Elevating Robust Multi-talker ASR By Decoupling Speaker Separation And Speech Recognition (2025)0.00
- End-to-end Multi-channel Speaker Extraction And Binaural Speech Synthesis (2024)0.00
- Investigation Of Practical Aspects Of Single Channel Speech Separation For ASR (2021)7.81
- WPD++: An Improved Neural Beamformer For Simultaneous Speech Separation And Dereverberation (2020)6.77