WNARS: WFST Based Non-autoregressive Streaming End-to-end Speech Recognition
2021 Β· Zhichao Wang, Wenwen Yang, Pan Zhou, et al.
Abstract
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR). AED models, however, still have drawbacks when deploying in commercial applications. Autoregressive beam search decoding makes it inefficient for high-concurrency applications. It is also inconvenient to integrate external word-level language models. The most important thing is that AED models are difficult for streaming recognition due to global attention mechanism. In this paper, we propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers (WFST) to solve these problems together. We switch from autoregressive beam search to CTC branch decoding, which performs first-pass decoding with WFST in chunk-wise streaming way. The decoder branch then performs second-pass rescoring on the generated hypotheses non-autoregressively. On the AISHELL-1 task, our WNARS achieves a charac
Authors
(none)
Tags
Stats
Related papers
- Cascaded Encoders For Unifying Streaming And Non-streaming ASR (2020)12.47
- Multi-stream End-to-end Speech Recognition (2019)8.35
- Stream Attention-based Multi-array End-to-end Speech Recognition (2018)0.00
- Wenet: Production Oriented Streaming And Non-streaming End-to-end Speech Recognition Toolkit (2021)17.27
- Improving Non-autoregressive End-to-end Speech Recognition With Pre-trained Acoustic And Language Models (2022)10.07
- Recognizing Long-form Speech Using Streaming End-to-end Models (2019)13.74
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Breaking Through The Spike: Spike Window Decoding For Accelerated And Precise Automatic Speech Recognition (2025)0.00