Simplified Self-attention For Transformer-based End-to-end Speech Recognition
2020 Β· Haoneng Luo, Shiliang Zhang, Ming Lei, et al.
Abstract
Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers. In this paper, to reduce the model complexity while maintaining good performance, we propose a simplified self-attention (SSAN) layer which employs FSMN memory block instead of projection layers to form query and key vectors for transformer-based end-to-end speech recognition. We evaluate the SSAN-based and the conventional SAN-based transformers on the public AISHELL-1, internal 1000-hour and 20,000-hour large-scale Mandarin tasks. Results show that our proposed SSAN-based transformer model can achieve over 20% relative reduction in model parameters and 6.7% relative CER reduction on the AIS
Authors
(none)
Tags
Stats
Related papers
- Transformer-based End-to-end Speech Recognition With Local Dense Synthesizer Attention (2020)12.04
- DFSMN-SAN With Persistent Memory Model For Automatic Speech Recognition (2019)5.84
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization (2024)3.58
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- Improving Transformer-based Conversational ASR By Inter-sentential Attention Mechanism (2022)7.50