Lookahead When It Matters: Adaptive Non-causal Transformers For Streaming Neural Transducers
2023 Β· Grant P. Strimel, Yi Xie, Brian King, et al.
Abstract
Streaming speech recognition architectures are employed for low-latency, real-time applications. Such architectures are often characterized by their causality. Causal architectures emit tokens at each frame, relying only on current and past signal, while non-causal models are exposed to a window of future frames at each step to increase predictive accuracy. This dichotomy amounts to a trade-off for real-time Automatic Speech Recognition (ASR) system design: profit from the low-latency benefit of strictly-causal architectures while accepting predictive performance limitations, or realize the modeling benefits of future-context models accompanied by their higher latency penalty. In this work, we relax the constraints of this choice and present the Adaptive Non-Causal Attention Transducer (ANCAT). Our architecture is non-causal in the traditional sense, but executes in a low-latency, streaming manner by dynamically choosing when to rely on future context and to what degree within the audi
Authors
(none)
Tags
Stats
Related papers
- Streaming Transformer Transducer Based Speech Recognition Using Non-causal Convolution (2021)8.82
- Mask-ctc-based Encoder Pre-training For Streaming End-to-end Speech Recognition (2023)0.00
- An Investigation Of Enhancing CTC Model For Triggered Attention-based Streaming ASR (2021)0.00
- Whisperrt -- Turning Whisper Into A Causal Streaming Model (2025)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Streaming Parallel Transducer Beam Search With Fast-slow Cascaded Encoders (2022)0.00
- Latency-controlled Neural Architecture Search For Streaming Speech Recognition (2021)0.00
- CUSIDE-T: Chunking, Simulating Future And Decoding For Transducer Based Streaming ASR (2024)2.26