Breaking Through The Spike: Spike Window Decoding For Accelerated And Precise Automatic Speech Recognition
2025 Β· Wei Zhang, Tian-Hao Zhang, Chao Luo, et al.
Abstract
Recently, end-to-end automatic speech recognition has become the mainstream approach in both industry and academia. To optimize system performance in specific scenarios, the Weighted Finite-State Transducer (WFST) is extensively used to integrate acoustic and language models, leveraging its capacity to implicitly fuse language models within static graphs, thereby ensuring robust recognition while also facilitating rapid error correction. However, WFST necessitates a frame-by-frame search of CTC posterior probabilities through autoregression, which significantly hampers inference speed. In this work, we thoroughly investigate the spike property of CTC outputs and further propose the conjecture that adjacent frames to non-blank spikes carry semantic information beneficial to the model. Building on this, we propose the Spike Window Decoding algorithm, which greatly improves the inference speed by making the number of frames decoded in WFST linearly related to the number of spiking frames
Authors
(none)
Tags
Stats
Related papers
- Spike-triggered Non-autoregressive Transformer For End-to-end Speech Recognition (2020)11.39
- WNARS: WFST Based Non-autoregressive Streaming End-to-end Speech Recognition (2021)0.00
- Gpu-accelerated WFST Beam Search Decoder For Ctc-based Speech Recognition (2023)7.51
- Streaming Keyword Spotting Boosted By Cross-layer Discrimination Consistency (2024)6.34
- Integration Of Frame- And Label-synchronous Beam Search For Streaming Encoder-decoder Speech Recognition (2023)0.00
- Gpu-accelerated Viterbi Exact Lattice Decoder For Batched Online And Offline Speech Recognition (2019)6.34
- End-to-end Adaptation With Backpropagation Through WFST For On-device Speech Recognition System (2019)5.24
- Speechnet: Weakly Supervised, End-to-end Speech Recognition At Industrial Scale (2022)0.00