Gaussian Kernelized Self-attention For Long Sequence Data And Its Application To Ctc-based Speech Recognition
2021 Β· Yosuke Kashiwagi, Emiru Tsunoo, Shinji Watanabe
Abstract
Self-attention (SA) based models have recently achieved significant performance improvements in hybrid and end-to-end automatic speech recognition (ASR) systems owing to their flexible context modeling capability. However, it is also known that the accuracy degrades when applying SA to long sequence data. This is mainly due to the length mismatch between the inference and training data because the training data are usually divided into short segments for efficient training. To mitigate this mismatch, we propose a new architecture, which is a variant of the Gaussian kernel, which itself is a shift-invariant kernel. First, we mathematically demonstrate that self-attention with shared weight parameters for queries and keys is equivalent to a normalized kernel function. By replacing this kernel function with the proposed Gaussian kernel, the architecture becomes completely shift-invariant with the relative position information embedded using a frame indexing technique. The proposed Gaussia
Authors
(none)
Tags
Stats
Related papers
- Transformer-based End-to-end Speech Recognition With Residual Gaussian-based Self-attention (2021)5.84
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- T-GSA: Transformer With Gaussian-weighted Self-attention For Speech Enhancement (2019)15.95
- Similarity And Content-based Phonetic Self Attention For Speech Recognition (2022)5.24
- End-to-end Contextual Asr Based On Posterior Distribution Adaptation For Hybrid Ctc/attention System (2022)0.00
- Attention-based Gated Scaling Adaptative Acoustic Model For Ctc-based Speech Recognition (2019)0.00
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49