Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization
2024 Β· Jianzong Wang, Ziqi Liang, Xulong Zhang, et al.
Abstract
In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared Residual Multi-Head Attention (SRMHA) and Chunk-Level Feedforward Networks (CFFN). The SRMHA module effectively reduces redundant computations in the network, while the CFFN module captures spatial knowledge and reduces the number of parameters. The effectiveness of the EfficientASR model is validated on two public datasets, namely Aishell-1 and HKUST. Experimental results demonstrate a 36% reduction in parameters compared to the baseline Transformer network, along with improvements of 0.3% and 0.2% in Character Error Rate (CER) on the Aishell-1 and HKUST datasets, respectively.
Authors
(none)
Tags
Stats
Related papers
- Simplified Self-attention For Transformer-based End-to-end Speech Recognition (2020)10.61
- An Efficient Speech Separation Network Based On Recurrent Fusion Dilated Convolution And Channel Attention (2023)0.00
- Effcrn: An Efficient Convolutional Recurrent Network For High-performance Speech Enhancement (2023)5.84
- Lightweight And Efficient End-to-end Speech Recognition Using Low-rank Transformer (2019)0.00
- Accurate And Structured Pruning For Efficient Automatic Speech Recognition (2023)7.81
- Resource-efficient Separation Transformer (2022)7.81
- Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture (2020)14.06
- CIF-T: A Novel Cif-based Transducer Architecture For Automatic Speech Recognition (2023)0.00