Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words
2021 Β· Santiago Cuervo, MacIej Grabias, Jan Chorowski, et al.
Abstract
We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing algorithms there is a trade off between categorization and segmentation performance. We investigate the source of this conflict and conclude that the use of context building networks, albeit necessary for superior performance on categorization tasks, harms segmentation performance by causing a temporal shift on the learned representations. Aiming to bridge this gap, we take inspiration from the leading approach on segmentation, which simultaneously models the speech signal at the frame and phoneme level, and incorporate multi-level modelling into Aligned CPC (ACPC), a variation of CPC which exhibits the best performance on categorization tasks. Our multi-level ACPC (mACPC) improves in all categorization metrics and achieves state-of-the-art performance in wor
Authors
(none)
Tags
Stats
Related papers
- Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation (2021)0.00
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- Aligned Contrastive Predictive Coding (2021)9.23
- Predicting Within And Across Language Phoneme Recognition Performance Of Self-supervised Learning Speech Pre-trained Models (2022)0.00
- Data Augmenting Contrastive Learning Of Speech Representations In The Time Domain (2020)12.81
- Variable-rate Hierarchical CPC Leads To Acoustic Unit Discovery In Speech (2022)0.00
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23