Unidirectional Memory-self-attention Transducer For Online Speech Recognition
2021 Β· Jian Luo, Jianzong Wang, Ning Cheng, et al.
Abstract
Self-attention models have been successfully applied in end-to-end speech recognition systems, which greatly improve the performance of recognition accuracy. However, such attention-based models cannot be used in online speech recognition, because these models usually have to utilize a whole acoustic sequences as inputs. A common method is restricting the field of attention sights by a fixed left and right window, which makes the computation costs manageable yet also introduces performance degradation. In this paper, we propose Memory-Self-Attention (MSA), which adds history information into the Restricted-Self-Attention unit. MSA only needs localtime features as inputs, and efficiently models long temporal contexts by attending memory states. Meanwhile, recurrent neural network transducer (RNN-T) has proved to be a great approach for online ASR tasks, because the alignments of RNN-T are local and monotonic. We propose a novel network structure, called Memory-Self-Attention (MSA) Trans
Authors
(none)
Tags
Stats
Related papers
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- Streaming Transformer-based Acoustic Models Using Self-attention With Augmented Memory (2020)0.00
- DFSMN-SAN With Persistent Memory Model For Automatic Speech Recognition (2019)5.84
- An Online Attention-based Model For Speech Recognition (2018)9.59
- Towards Online End-to-end Transformer Automatic Speech Recognition (2019)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Streaming Attention-based Models With Augmented Memory For End-to-end Speech Recognition (2020)5.84