An Online Attention-based Model For Speech Recognition
2018 Β· Ruchao Fan, Pan Zhou, Wei Chen, et al.
Abstract
Attention-based end-to-end models such as Listen, Attend and Spell (LAS), simplify the whole pipeline of traditional automatic speech recognition (ASR) systems and become popular in the field of speech recognition. In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism. However, bidirectional encoder and GSA are two obstacles for real-time speech recognition. In this work, we aim to stream LAS baseline by removing the above two obstacles. On the encoder side, we use a latency-controlled (LC) bidirectional structure to reduce the delay of forward computation. Meanwhile, an adaptive monotonic chunk-wise attention (AMoChA) mechanism is proposed to replace GSA for the calculation of attention weight distribution. Furthermore, we propose two methods to alleviate the huge performance degradation when combining LC and AMoChA. Finally
Authors
(none)
Tags
Stats
Related papers
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Listen Attentively, And Spell Once: Whole Sentence Generation Via A Non-autoregressive Architecture For Low-latency Speech Recognition (2020)10.07
- Streaming Chunk-aware Multihead Attention For Online End-to-end Speech Recognition (2020)8.60
- Fast End-to-end Speech Recognition Via Non-autoregressive Models And Cross-modal Knowledge Transferring From BERT (2021)12.93
- Streaming Attention-based Models With Augmented Memory For End-to-end Speech Recognition (2020)5.84
- Online Hybrid Ctc/attention End-to-end Automatic Speech Recognition Architecture (2023)12.99
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Attention-based Sequence-to-sequence Model For Speech Recognition: Development Of State-of-the-art System On Librispeech And Its Application To Non-native English (2018)0.00