An Embedded Segmental K-means Model For Unsupervised Segmentation And Clustering Of Speech
2017 Β· Herman Kamper, Karen Livescu, Sharon Goldwater
Abstract
Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite competitive performance in previous work, the full Bayesian approach is difficult to scale to large speech corpora. We introduce an approximation to a recent Bayesian model that still has a clear objective function but improves efficiency by using hard clustering and segmentation rather than full Bayesian inference. Like its Bayesian counterpart, this embedded segmental K-means model (ES-KMeans) represents arbitrary-length word segments as fixed-dimensional acoustic word embeddings. We first compare ES-KMeans to previous approaches on common English and Xitsonga data sets (5 and 2.5 hours of speech): ES-KMeans outperforms a leading heuristic method in word segmentation, giving similar scores t
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Lexicon Learning From Speech Is Limited By Representations Rather Than Clustering (2025)0.00
- Unsupervised Word Discovery: Boundary Detection With Clustering Vs. Dynamic Programming (2024)3.58
- Unsupervised Word Segmentation And Lexicon Discovery Using Acoustic Word Embeddings (2016)12.10
- Unsupervised Neural And Bayesian Models For Zero-resource Speech Processing (2017)0.00
- Unsupervised Speech Segmentation: A General Approach Using Speech Language Models (2025)2.60
- Multilingual And Unsupervised Subword Modeling For Zero-resource Languages (2018)7.81
- Unsupervised Word Segmentation From Discrete Speech Units In Low-resource Settings (2021)0.00
- Improving Unsupervised Subword Modeling Via Disentangled Speech Representation Learning And Transformation (2019)5.24