Transformer-based Online Speech Recognition With Decoder-end Adaptive Computation Steps
2020 Β· Mohan Li, Catalin Zorila, Rama Doddipatla
Abstract
Transformer-based end-to-end (E2E) automatic speech recognition (ASR) systems have recently gained wide popularity, and are shown to outperform E2E models based on recurrent structures on a number of ASR tasks. However, like other E2E models, Transformer ASR also requires the full input sequence for calculating the attentions on both encoder and decoder, leading to increased latency and posing a challenge for online ASR. The paper proposes Decoder-end Adaptive Computation Steps (DACS) algorithm to address the issue of latency and facilitate online ASR. The proposed algorithm streams the decoding of Transformer ASR by triggering an output after the confidence acquired from the encoder states reaches a certain threshold. Unlike other monotonic attention mechanisms that risk visiting the entire encoder states for each output step, the paper introduces a maximum look-ahead step into the DACS algorithm to prevent from reaching the end of speech too fast. A Chunkwise encoder is adopted in ou
Authors
(none)
Tags
Stats
Related papers
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- End-to-end Speech Recognition With Adaptive Computation Steps (2018)0.00
- Towards Online End-to-end Transformer Automatic Speech Recognition (2019)0.00
- Effective Decoder Masking For Transformer Based End-to-end Speech Recognition (2020)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Synchronous Transformers For End-to-end Speech Recognition (2019)12.02
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16
- Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition (2023)3.58