State-of-the-art Speech Recognition Using Multi-stream Self-attention With Dilated 1D Convolutions
2019 Β· Kyu J. Han, Ramon Prieto, Kaixing Wu, et al.
Abstract
Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well. The efficacy of self-attention in speech applications, however, seems not fully blown yet since it is challenging to handle highly correlated speech frames in the context of self-attention. In this paper we propose a new neural network model architecture, namely multi-stream self-attention, to address the issue thus make the self-attention mechanism more effective for speech recognition. The proposed model architecture consists of parallel streams of self-attention encoders, and each stream has layers of 1D convolutions with dilated kernels whose dilation rates are unique given stream, followed by a self-attention layer. The self-attention mechanism in each stream pays attention to only one resolution of input speech frames and the attentive computation can be more efficient. In a later stage, outputs from all the streams are concatena
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement Using Multi-stage Self-attentive Temporal Convolutional Networks (2021)14.15
- Multistream CNN For Robust Acoustic Modeling (2020)10.21
- Self Multi-head Attention For Speaker Recognition (2019)13.84
- Saladnet: Self-attentive Multisource Localization In The Ambisonics Domain (2021)7.50
- Automatic Lyrics Transcription Using Dilated Convolutional Neural Networks With Self-attention (2020)10.07
- Df-conformer: Integrated Architecture Of Conv-tasnet And Conformer Using Linear Complexity Self-attention For Speech Enhancement (2021)11.29
- Attention-based Neural Beamforming Layers For Multi-channel Speech Recognition (2021)0.00
- End-to-end Language Identification Using Multi-head Self-attention And 1D Convolutional Neural Networks (2021)0.00