Generative Pre-training For Speech With Autoregressive Predictive Coding
2019 Β· Yu-An Chung, James Glass
Abstract
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging. In this paper we propose to use autoregressive predictive coding (APC), a recently proposed self-supervised objective, as a generative pre-training approach for learning meaningful, non-specific, and transferable speech representations. We pre-train APC on large-scale unlabeled data and conduct transfer learning experiments on three speech applications that require different information about speech characteristics to perform well: speech recognition, speech translation, and speaker identification. Extensive experiments show that APC not only outperforms surface features (e.g., log Mel spectrograms) and other popular representation learning methods on all three tasks, but is also effective at reducing downstream labeled data size and model parameters. We also investigate the use of Transformers for modeling APC and find it superior to RNNs.
Authors
(none)
Tags
Stats
Related papers
- Bi-apc: Bidirectional Autoregressive Predictive Coding For Unsupervised Pre-training And Its Application To Children's ASR (2021)6.34
- Improved Speech Representations With Multi-target Autoregressive Predictive Coding (2020)10.97
- An Unsupervised Autoregressive Model For Speech Representation Learning (2019)17.26
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- Non-autoregressive Predictive Coding For Learning Speech Representations From Local Dependencies (2020)12.47
- A Further Study Of Unsupervised Pre-training For Transformer Based Speech Recognition (2020)9.41
- Generative Pre-trained Speech Language Model With Efficient Hierarchical Transformer (2024)5.96
- Improving Transformer-based Speech Recognition Using Unsupervised Pre-training (2019)0.00