Transformer-based End-to-end Speech Recognition With Residual Gaussian-based Self-attention
2021 Β· Chengdong Liang, Menglong Xu, Xiao-Lei Zhang
Abstract
Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship between adjacent signals ignored. To address this issue, in this paper, we introduce relative-position-awareness self-attention (RPSA). It not only maintains the global-range dependency modeling ability of self-attention, but also improves the localness modeling ability. Because the local window length of the original RPSA is fixed and sensitive to different test data, here we propose Gaussian-based self-attention (GSA) whose window length is learnable and adaptive to the test data automatically. We further generalize GSA to a new residual Gaussian self-attention (resGSA) for the performance impr
Authors
(none)
Tags
Stats
Related papers
- Gaussian Kernelized Self-attention For Long Sequence Data And Its Application To Ctc-based Speech Recognition (2021)4.52
- T-GSA: Transformer With Gaussian-weighted Self-attention For Speech Enhancement (2019)15.95
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Adversarial Joint Training With Self-attention Mechanism For Robust End-to-end Speech Recognition (2021)0.00
- Unidirectional Memory-self-attention Transducer For Online Speech Recognition (2021)3.58
- Transformer-based End-to-end Speech Recognition With Local Dense Synthesizer Attention (2020)12.04
- Similarity And Content-based Phonetic Self Attention For Speech Recognition (2022)5.24
- Location-relative Attention Mechanisms For Robust Long-form Speech Synthesis (2019)13.11