Towards Unsupervised Phone And Word Segmentation Using Self-supervised Vector-quantized Neural Networks
2020 Β· Herman Kamper, Benjamin van Niekerk
Abstract
We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized dynamic programming method generally performs best. While performance on individual tasks is only comparable to the
Authors
(none)
Tags
Stats
Related papers
- Word Segmentation On Discovered Phone Units With Dynamic Programming And Self-supervised Scoring (2022)9.23
- Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations (2019)0.00
- Learning Disentangled Phone And Speaker Representations In A Semi-supervised VQ-VAE Paradigm (2020)8.09
- Unsupervised Speech Segmentation And Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding (2021)9.92
- Speech Enhancement Using Self-supervised Pre-trained Model And Vector Quantization (2022)6.34
- Robust Training Of Vector Quantized Bottleneck Models (2020)11.29
- Self-supervised Learning With Random-projection Quantizer For Speech Recognition (2022)0.00
- Vector-quantized Neural Networks For Acoustic Unit Discovery In The Zerospeech 2020 Challenge (2020)13.50