Boosting Hybrid Autoregressive Transducer-based ASR With Internal Acoustic Model Training And Dual Blank Thresholding
2024 Β· Takafumi Moriya, Takanori Ashihara, Masato Mimura, et al.
Abstract
A hybrid autoregressive transducer (HAT) is a variant of neural transducer that models blank and non-blank posterior distributions separately. In this paper, we propose a novel internal acoustic model (IAM) training strategy to enhance HAT-based speech recognition. IAM consists of encoder and joint networks, which are fully shared and jointly trained with HAT. This joint training not only enhances the HAT training efficiency but also encourages IAM and HAT to emit blanks synchronously which skips the more expensive non-blank computation, resulting in more effective blank thresholding for faster decoding. Experiments demonstrate that the relative error reductions of the HAT with IAM compared to the vanilla HAT are statistically significant. Moreover, we introduce dual blank thresholding, which combines both HAT- and IAM-blank thresholding and a compatible decoding algorithm. This results in a 42-75% decoding speed-up with no major performance degradation.
Authors
(none)
Tags
Stats
Related papers
- Modular Hybrid Autoregressive Transducer (2022)8.35
- On Minimum Word Error Rate Training Of The Hybrid Autoregressive Transducer (2020)4.52
- HAINAN: Fast And Accurate Transducer For Hybrid-autoregressive ASR (2024)0.00
- Transformer-based Acoustic Modeling For Hybrid Speech Recognition (2019)16.30
- Investigating Methods To Improve Language Model Integration For Attention-based Encoder-decoder ASR Models (2021)0.00
- 4D ASR: Joint Modeling Of CTC, Attention, Transducer, And Mask-predict Decoders (2022)7.50
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Adversarial Defense For Deep Speaker Recognition Using Hybrid Adversarial Training (2020)9.59