Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding
2021 · Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, et al.
Abstract
Typically, unsupervised segmentation of speech into the phone and word-like units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them and propose a technique that can jointly perform both, showing that these two tasks indeed benefit from each other. Recent attempts employ self-supervised learning, such as contrastive predictive coding (CPC), where the next frame is predicted given past context. However, CPC only looks at the audio signal's frame-level structure. We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e.g., phone level. A convolutional neural network learns frame-level representation from the raw waveform via noise-contrastive estimation (NCE). A differentiable boundary detector finds variable-length segments, which are then used to optimize a segment encoder via NCE to learn segmen
Authors
(none)
Tags
Stats
Related papers
- Segmental Contrastive Predictive Coding For Unsupervised Word Segmentation (2021)0.00
- Contrastive Prediction Strategies For Unsupervised Segmentation And Categorization Of Phonemes And Words (2021)9.23
- Word Segmentation On Discovered Phone Units With Dynamic Programming And Self-supervised Scoring (2022)9.23
- Aligned Contrastive Predictive Coding (2021)9.23
- Contrastive Separative Coding For Self-supervised Representation Learning (2021)0.00
- Variable-rate Hierarchical CPC Leads To Acoustic Unit Discovery In Speech (2022)0.00
- Neural Predictive Coding Using Convolutional Neural Networks Towards Unsupervised Learning Of Speaker Characteristics (2018)11.85
- Self-supervised Contrastive Learning For Unsupervised Phoneme Segmentation (2020)12.68