Recent Advances In End-to-end Automatic Speech Recognition
2021 Β· Jinyu Li
Abstract
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.
Authors
(none)
Tags
Stats
Related papers
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- On The Comparison Of Popular End-to-end Models For Large Scale Speech Recognition (2020)0.00
- Survey Of End-to-end Multi-speaker Automatic Speech Recognition For Monaural Audio (2025)2.26
- A Comparison Of End-to-end Models For Long-form Speech Recognition (2019)12.93
- Recognizing Long-form Speech Using Streaming End-to-end Models (2019)13.74
- Have Best Of Both Worlds: Two-pass Hybrid And E2E Cascading Framework For Speech Recognition (2021)6.34
- A Comparative Study On Neural Architectures And Training Methods For Japanese Speech Recognition (2021)7.50
- Unified End-to-end Speech Recognition And Endpointing For Fast And Efficient Speech Systems (2022)5.24