Streaming Transformer-based Acoustic Models Using Self-attention With Augmented Memory
2020 Β· Chunyang Wu, Yongqiang Wang, Yangyang Shi, et al.
Abstract
Transformer-based acoustic modeling has achieved great suc-cess for both hybrid and sequence-to-sequence speech recogni-tion. However, it requires access to the full sequence, and thecomputational cost grows quadratically with respect to the in-put sequence length. These factors limit its adoption for stream-ing applications. In this work, we proposed a novel augmentedmemory self-attention, which attends on a short segment of theinput sequence and a bank of memories. The memory bankstores the embedding information for all the processed seg-ments. On the librispeech benchmark, our proposed methodoutperforms all the existing streamable transformer methods bya large margin and achieved over 15% relative error reduction,compared with the widely used LC-BLSTM baseline. Our find-ings are also confirmed on some large internal datasets.
Authors
(none)
Tags
Stats
Related papers
- Streaming Simultaneous Speech Translation With Augmented Memory Transformer (2020)6.77
- Streaming Attention-based Models With Augmented Memory For End-to-end Speech Recognition (2020)5.84
- Transformer-based Acoustic Modeling For Hybrid Speech Recognition (2019)16.30
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- A Low Latency Attention Module For Streaming Self-supervised Speech Representation Learning (2023)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Transformer In Action: A Comparative Study Of Transformer-based Acoustic Models For Large Scale Speech Recognition Applications (2020)9.41
- Unidirectional Memory-self-attention Transducer For Online Speech Recognition (2021)3.58