Label-looping: Highly Efficient Decoding For Transducers
2024 Β· Vladimir Bataev, Hainan Xu, Daniel Galvez, et al.
Abstract
This paper introduces a highly efficient greedy decoding algorithm for Transducer-based speech recognition models. We redesign the standard nested-loop design for RNN-T decoding, swapping loops over frames and labels: the outer loop iterates over labels, while the inner loop iterates over frames searching for the next non-blank symbol. Additionally, we represent partial hypotheses in a special structure using CUDA tensors, supporting parallelized hypotheses manipulations. Experiments show that the label-looping algorithm is up to 2.0X faster than conventional batched decoding when using batch size 32. It can be further combined with other compiler or GPU call-related techniques to achieve even more speedup. Our algorithm is general-purpose and can work with both conventional Transducers and Token-and-Duration Transducers. We open-source our implementation to benefit the research community.
Authors
(none)
Tags
Stats
Related papers
- Developing Real-time Streaming Transformer Transducer For Speech Recognition On Large-scale Dataset (2020)0.00
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)18.58
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Gpu-accelerated Viterbi Exact Lattice Decoder For Batched Online And Offline Speech Recognition (2019)6.34
- Neural Transducer Training: Reduced Memory Consumption With Sample-wise Computation (2022)0.00
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93
- Self-attention Linguistic-acoustic Decoder (2018)2.26
- Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition (2023)3.58