Have Best Of Both Worlds: Two-pass Hybrid And E2E Cascading Framework For Speech Recognition
2021 Β· Guoli Ye, Vadim Mazalov, Jinyu Li, et al.
Abstract
Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results. By jointly modeling audio and text, the E2E model performs better in matched scenarios and scales well with a large amount of paired audio-text training data. The modularized hybrid model is easier for customization, and better to make use of a massive amount of unpaired text data. This paper proposes a two-pass hybrid and E2E cascading (HEC) framework to combine the hybrid and E2E model in order to take advantage of both sides, with hybrid in the first pass and E2E in the second pass. We show that the proposed system achieves 8-10% relative word error rate reduction with respect to each individual system. More importantly, compared with the pure E2E system, we show the proposed system has the potential to keep the advantages of hybrid system, e.g., customization and segmentation capabilities. We also show the second pass E2E model in HEC is robust with
Authors
(none)
Tags
Stats
Related papers
- Two-pass Decoding And Cross-adaptation Based System Combination Of End-to-end Conformer And Hybrid TDNN ASR Systems (2022)6.34
- Recent Advances In End-to-end Automatic Speech Recognition (2021)18.62
- Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020)0.00
- Unified End-to-end Speech Recognition And Endpointing For Fast And Efficient Speech Systems (2022)5.24
- When End-to-end Is Overkill: Rethinking Cascaded Speech-to-text Translation (2025)0.00
- Two-pass End-to-end Speech Recognition (2019)13.97
- CAT: A CTC-CRF Based ASR Toolkit Bridging The Hybrid And The End-to-end Approaches Towards Data Efficiency And Low Latency (2020)9.03
- Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019)14.97