A Further Study Of Unsupervised Pre-training For Transformer Based Speech Recognition
2020 Β· Dongwei Jiang, Wubo Li, Ruixiong Zhang, et al.
Abstract
Building a good speech recognition system usually requires large amounts of transcribed data, which is expensive to collect. To tackle this problem, many unsupervised pre-training methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and Transformer backbone. However, many aspects of MPC have not been fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks. Experiments reveled that pre-training data with matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided 8.46% relative error reduction on streaming model trained on HKUST. Also, the combination of target data ada
Authors
(none)
Tags
Stats
Related papers
- Improving Transformer-based Speech Recognition Using Unsupervised Pre-training (2019)0.00
- Analysing The Masked Predictive Coding Training Criterion For Pre-training A Speech Representation Model (2023)4.52
- Pre-training Transformer Decoder For End-to-end ASR Model With Unpaired Speech Data (2022)13.47
- Generative Pre-training For Speech With Autoregressive Predictive Coding (2019)14.73
- Mask-ctc-based Encoder Pre-training For Streaming End-to-end Speech Recognition (2023)0.00
- Unsupervised Pre-training Of Bidirectional Speech Encoders Via Masked Reconstruction (2020)12.33
- Masked Pre-trained Encoder Base On Joint Ctc-transformer (2020)0.00
- Unsupervised Pre-training For Sequence To Sequence Speech Recognition (2019)0.00