End-to-end Speech Recognition With Adaptive Computation Steps
2018 Β· Mohan Li, Min Liu, Masanori Hattori
Abstract
In this paper, we present Adaptive Computation Steps (ACS) algo-rithm, which enables end-to-end speech recognition models to dy-namically decide how many frames should be processed to predict a linguistic output. The model that applies ACS algorithm follows the encoder-decoder framework, while unlike the attention-based mod-els, it produces alignments independently at the encoder side using the correlation between adjacent frames. Thus, predictions can be made as soon as sufficient acoustic information is received, which makes the model applicable in online cases. Besides, a small change is made to the decoding stage of the encoder-decoder framework, which allows the prediction to exploit bidirectional contexts. We verify the ACS algorithm on a Mandarin speech corpus AIShell-1, and it achieves a 31.2% CER in the online occasion, compared to the 32.4% CER of the attention-based model. To fully demonstrate the advantage of ACS algorithm, offline experiments are conducted, in which our AC
Authors
(none)
Tags
Stats
Related papers
- Transformer-based Online Speech Recognition With Decoder-end Adaptive Computation Steps (2020)7.81
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- Attention-based Gated Scaling Adaptative Acoustic Model For Ctc-based Speech Recognition (2019)0.00
- Alignment Knowledge Distillation For Online Streaming Attention-based Speech Recognition (2021)7.16
- Effectiveasr: A Single-step Non-autoregressive Mandarin Speech Recognition Architecture With High Accuracy And Inference Speed (2024)3.58
- Towards End-to-end Code-switching Speech Recognition (2018)0.00
- Unified Streaming And Non-streaming Two-pass End-to-end Model For Speech Recognition (2020)0.00
- Integrating Source-channel And Attention-based Sequence-to-sequence Models For Speech Recognition (2019)8.09