HAINAN: Fast And Accurate Transducer For Hybrid-autoregressive ASR
2024 Β· Hainan Xu, Travis M. Bartley, Vladimir Bataev, et al.
Abstract
We present Hybrid-Autoregressive INference TrANsducers (HAINAN), a novel architecture for speech recognition that extends the Token-and-Duration Transducer (TDT) model. Trained with randomly masked predictor network outputs, HAINAN supports both autoregressive inference with all network components and non-autoregressive inference without the predictor. Additionally, we propose a novel semi-autoregressive inference paradigm that first generates an initial hypothesis using non-autoregressive inference, followed by refinement steps where each token prediction is regenerated using parallelized autoregression on the initial hypothesis. Experiments on multiple datasets across different languages demonstrate that HAINAN achieves efficiency parity with CTC in non-autoregressive mode and with TDT in autoregressive mode. In terms of accuracy, autoregressive HAINAN outperforms TDT and RNN-T, while non-autoregressive HAINAN significantly outperforms CTC. Semi-autoregressive inference further enhan
Authors
(none)
Tags
Stats
Related papers
- Nana-hdr: A Non-attentive Non-autoregressive Hybrid Model For TTS (2021)2.26
- Boosting Hybrid Autoregressive Transducer-based ASR With Internal Acoustic Model Training And Dual Blank Thresholding (2024)2.26
- CIF-T: A Novel Cif-based Transducer Architecture For Automatic Speech Recognition (2023)0.00
- Effectiveasr: A Single-step Non-autoregressive Mandarin Speech Recognition Architecture With High Accuracy And Inference Speed (2024)3.58
- Spike-triggered Non-autoregressive Transformer For End-to-end Speech Recognition (2020)11.39
- TSNAT: Two-step Non-autoregressvie Transformer Models For Speech Recognition (2021)10.61
- An Improved Single Step Non-autoregressive Transformer For Automatic Speech Recognition (2021)0.00
- Self-attention Transducers For End-to-end Speech Recognition (2019)11.93