Lightweight And Efficient End-to-end Speech Recognition Using Low-rank Transformer
2019 Β· Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, et al.
Abstract
Highly performing deep neural networks come at the cost of computational complexity that limits their practicality for deployment on portable devices. We propose the low-rank transformer (LRT), a memory-efficient and fast neural architecture that significantly reduces the parameters and boosts the speed of training and inference for end-to-end speech recognition. Our approach reduces the number of parameters of the network by more than 50% and speeds up the inference time by around 1.35x compared to the baseline transformer model. The experiments show that our LRT model generalizes better and yields lower error rates on both validation and test sets compared to an uncompressed transformer model. The LRT model outperforms those from existing works on several datasets in an end-to-end setting without using an external language model or acoustic data.
Authors
(none)
Tags
Stats
Related papers
- Multitask Learning And Joint Optimization For Transformer-rnn-transducer Speech Recognition (2020)8.09
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16
- Bayesspeech: A Bayesian Transformer Network For Automatic Speech Recognition (2023)0.00
- Full-rank No More: Low-rank Weight Training For Modern Speech Recognition Models (2024)2.26
- Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020)0.00
- Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization (2024)3.58