Towards A Competitive End-to-end Speech Recognition For Chime-6 Dinner Party Transcription
2020 Β· Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov
Abstract
While end-to-end ASR systems have proven competitive with the conventional hybrid approach, they are prone to accuracy degradation when it comes to noisy and low-resource conditions. In this paper, we argue that, even in such difficult cases, some end-to-end approaches show performance close to the hybrid baseline. To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging environments and noisy conditions of everyday speech. We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures. We also provide a comparison of acoustic features and speech enhancements. Besides, we evaluate the effectiveness of neural network language models for hypothesis re-scoring in low-resource conditions. Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline. With the Guided Source Sepa
Authors
(none)
Tags
Stats
Related papers
- Exploring Neural Transducers For End-to-end Speech Recognition (2017)14.90
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Speaker Reinforcement Using Target Source Extraction For Robust Automatic Speech Recognition (2022)7.50
- A Comparison Of End-to-end Models For Long-form Speech Recognition (2019)12.93
- End-to-end Target Speaker Speech Recognition Using Context-aware Attention Mechanisms For Challenging Enrollment Scenario (2025)0.00
- Integrating Text Inputs For Training And Adapting RNN Transducer ASR Models (2022)9.59
- Improving RNN Transducer Based ASR With Auxiliary Tasks (2020)9.59
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93