Streaming Parallel Transducer Beam Search With Fast-slow Cascaded Encoders
2022 Β· Jay Mahadeokar, Yangyang Shi, Ke Li, et al.
Abstract
Streaming ASR with strict latency constraints is required in many speech recognition applications. In order to achieve the required latency, streaming ASR models sacrifice accuracy compared to non-streaming ASR models due to lack of future input context. Previous research has shown that streaming and non-streaming ASR for RNN Transducers can be unified by cascading causal and non-causal encoders. This work improves upon this cascaded encoders framework by leveraging two streaming non-causal encoders with variable input context sizes that can produce outputs at different audio intervals (e.g. fast and slow). We propose a novel parallel time-synchronous beam search algorithm for transducers that decodes from fast-slow encoders, where the slow encoder corrects the mistakes generated from the fast encoder. The proposed algorithm, achieves up to 20% WER reduction with a slight increase in token emission delays on the public Librispeech dataset and in-house datasets. We also explore techniqu
Authors
(none)
Tags
Stats
Related papers
- Integration Of Frame- And Label-synchronous Beam Search For Streaming Encoder-decoder Speech Recognition (2023)0.00
- Cascaded Encoders For Unifying Streaming And Non-streaming ASR (2020)12.47
- Navigating The Minefield Of MT Beam Search In Cascaded Streaming Speech Translation (2024)3.58
- Run-and-back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-decoder ASR (2022)3.58
- Minimum Latency Training Strategies For Streaming Sequence-to-sequence ASR (2020)10.07
- Vectorization Of Hypotheses And Speech For Faster Beam Search In Encoder Decoder-based Speech Recognition (2018)0.00
- Mask-ctc-based Encoder Pre-training For Streaming End-to-end Speech Recognition (2023)0.00
- Segment-level Vectorized Beam Search Based On Partially Autoregressive Inference (2023)0.00