Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain
2020 · Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, et al.
Abstract
Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-
Authors
(none)
Tags
Stats
Related papers
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words (2021)9.23
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- LPC Augment: An Lpc-based ASR Data Augmentation Algorithm For Low And Zero-resource Children's Dialects (2022)7.81
- Significance Of Data Augmentation For Improving Cleft Lip And Palate Speech Recognition (2021)0.00
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Learning Disentangled Speech Representations With Contrastive Learning And Time-invariant Retrieval (2024)5.84