Variable-rate Hierarchical CPC Leads To Acoustic Unit Discovery In Speech
2022 · Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, et al.
Abstract
The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets. Accounting for the structure
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words (2021)9.23
- Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation (2021)0.00
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- Aligned Contrastive Predictive Coding (2021)9.23
- Hierarchical Multitask Learning For Ctc-based Speech Recognition (2018)0.00
- Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain (2020)12.81