Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition
2022 Β· Aparna Khare, Minhua Wu, Saurabhchand Bhati, et al.
Abstract
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Automatic Speech Recognition (ASR) model. We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). Our proposed method maximizes the mutual information between representations from a prior-knowledge model and the output of the model being pre-trained, allowing prior knowledge injection during pre-training. We validate our method on 3 ASR tasks: German, French and English. Our method outperforms CPC pre-training on all three datasets, reducing the Word Error Rate (WER) by 4.44%, 6.55% and 15.43% relative on the German, French and English (Librispeech) tasks respectively, compared to training from scratch, while CPC pre-training only brings 2.96%, 1.01% and 14.39% relative WER reduction respectively.
Authors
(none)
Tags
Stats
Related papers
- Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words (2021)9.23
- Supervision-guided Codebooks For Masked Prediction In Speech Pre-training (2022)7.81
- Aligned Contrastive Predictive Coding (2021)9.23
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23
- Joint Masked CPC And CTC Training For ASR (2020)8.60
- Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain (2020)12.81
- Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation (2021)0.00
- Scala: Supervised Contrastive Learning For End-to-end Speech Recognition (2021)2.26