Transformer-based End-to-end Speech Recognition With Local Dense Synthesizer Attention
2020 Β· Menglong Xu, Shengqiang Li, Xiao-Lei Zhang
Abstract
Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49%, which is slightly better than that of the SA-Transformer. Meanwhile, the LDSA-Transformer requires less computati
Authors
(none)
Tags
Stats
Related papers
- Simplified Self-attention For Transformer-based End-to-end Speech Recognition (2020)10.61
- Similarity And Content-based Phonetic Self Attention For Speech Recognition (2022)5.24
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Transformer-based End-to-end Speech Recognition With Residual Gaussian-based Self-attention (2021)5.84
- Transformer-based Online Speech Recognition With Decoder-end Adaptive Computation Steps (2020)7.81
- Improving Hybrid Ctc/attention End-to-end Speech Recognition With Pretrained Acoustic And Language Model (2021)8.82