Improving Transformer-based Speech Recognition Using Unsupervised Pre-training
2019 Β· Dongwei Jiang, Xiaoning Lei, Wubo Li, et al.
Abstract
Speech recognition technologies are gaining enormous popularity in various industrial applications. However, building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, an unsupervised pre-training method called Masked Predictive Coding is proposed, which can be applied for unsupervised pre-training with Transformer based model. Experiments on HKUST show that using the same training data, we can achieve CER 23.3%, exceeding the best end-to-end model by over 0.2% absolute CER. With more pre-training data, we can further reduce the CER to 21.0%, or a 11.8% relative CER reduction over baseline.
Authors
(none)
Tags
Stats
Related papers
- A Further Study Of Unsupervised Pre-training For Transformer Based Speech Recognition (2020)9.41
- Pre-training Transformer Decoder For End-to-end ASR Model With Unpaired Speech Data (2022)13.47
- Supervision-guided Codebooks For Masked Prediction In Speech Pre-training (2022)7.81
- Effective Decoder Masking For Transformer Based End-to-end Speech Recognition (2020)0.00
- Unsupervised Pre-training Of Bidirectional Speech Encoders Via Masked Reconstruction (2020)12.33
- Unsupervised Pre-training For Sequence To Sequence Speech Recognition (2019)0.00
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16
- Improving Hybrid Ctc/attention End-to-end Speech Recognition With Pretrained Acoustic And Language Model (2021)8.82