Hmm-free Encoder Pre-training For Streaming RNN Transducer
2021 Β· Lu Huang, Jingyu Sun, Yufeng Tang, et al.
Abstract
This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equipping HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model for pre-training. Experiments conducted on LibriSpeech and MLS English tasks show the proposed pre-t
Authors
(none)
Tags
Stats
Related papers
- Exploring Pre-training With Alignments For RNN Transducer Based End-to-end Speech Recognition (2020)9.41
- Mask-ctc-based Encoder Pre-training For Streaming End-to-end Speech Recognition (2023)0.00
- One In A Hundred: Select The Best Predicted Sequence From Numerous Candidates For Streaming Speech Recognition (2020)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)18.58
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)16.21
- Focused Discriminative Training For Streaming Ctc-trained Automatic Speech Recognition Models (2024)0.00
- Alignment Restricted Streaming Recurrent Neural Network Transducer (2020)11.19