CTC Blank Triggered Dynamic Layer-skipping For Efficient Ctc-based Speech Recognition
2024 Β· Junfeng Hou, Peiyao Wang, Jincheng Zhang, et al.
Abstract
Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method that leverages the CTC blank output from intermediate layers to trigger the skipping of the last few encoder layers for frames with high blank probabilities. Furthermore, we factorize the CTC output distribution and perform knowledge distillation on intermediate layers to reduce computation and improve recognition accuracy. Experimental results show that by utilizing the CTC blank, the encoder layer depth can be adjusted dynamically, resulting in 29% acceleration of the CTC model inference with minor performance degradation.
Authors
(none)
Tags
Stats
Related papers
- Inter-kd: Intermediate Knowledge Distillation For Ctc-based Automatic Speech Recognition (2022)7.50
- Leveraging Language ID To Calculate Intermediate CTC Loss For Enhanced Code-switching Speech Recognition (2023)0.00
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Towards Personalization Of CTC Speech Recognition Models With Contextual Adapters And Adaptive Boosting (2022)0.00
- Blank Collapse: Compressing CTC Emission For The Faster Decoding (2022)0.00