Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation
2021 · Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, et al.
Abstract
Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame is predicted given past context. However, CPC only looks at the audio signal's frame-level structure. We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e.g. at the phoneme level. In this framework, a convolutional neural network learns frame-level representation from the raw waveform via noise-contrastive estimation (NCE). A differentiable boundary detector finds variable-length segments, which are then used to optimize a segment encoder via NCE to learn segment representations. The differentiable boundary detector allows us to train frame-level and segment-level encoders jointly. Typically, phoneme and word segmentation are treated as separate tasks. We un
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words (2021)9.23
- Self-supervised Contrastive Learning For Unsupervised Phoneme Segmentation (2020)12.68
- Aligned Contrastive Predictive Coding (2021)9.23
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00
- Word Segmentation On Discovered Phone Units With Dynamic Programming And Self-supervised Scoring (2022)9.23
- Analyzing Speaker Information In Self-supervised Models To Improve Zero-resource Speech Processing (2021)9.23
- Scala: Supervised Contrastive Learning For End-to-end Speech Recognition (2021)2.26