Wenet: Production Oriented Streaming And Non-streaming End-to-end Speech Recognition Toolkit
2021 Β· Zhuoyuan Yao, di Wu, Xiong Wang, et al.
Abstract
In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model. The main motivation of WeNet is to close the gap between the research and the production of E2E speechrecognition models. WeNet provides an efficient way to ship ASR applications in several real-world scenarios, which is the main difference and advantage to other open source E2E speech recognition toolkits. In our toolkit, a new two-pass method is implemented. Our method propose a dynamic chunk-based attention strategy of the the transformer layers to allow arbitrary right context length modifies in hybrid CTC/attention architecture. The inference latency could be easily controlled by only changing the chunk size. The CTC hypotheses are then rescored by the attention decoder to get the final result. Our experiments on the AISHELL
Authors
(none)
Tags
Stats
Related papers
- WNARS: WFST Based Non-autoregressive Streaming End-to-end Speech Recognition (2021)0.00
- Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020)0.00
- Speechnet: Weakly Supervised, End-to-end Speech Recognition At Industrial Scale (2022)0.00
- Espnet-se: End-to-end Speech Enhancement And Separation Toolkit Designed For Asr Integration (2020)13.55
- Two-pass End-to-end Speech Recognition (2019)13.97
- Unified End-to-end Speech Recognition And Endpointing For Fast And Efficient Speech Systems (2022)5.24
- Espnet-tts: Unified, Reproducible, And Integratable Open Source End-to-end Text-to-speech Toolkit (2019)23.32
- Multi-stream End-to-end Speech Recognition (2019)8.35