Unsupervised Word Segmentation And Lexicon Discovery Using Acoustic Word Embeddings
2016 Β· Herman Kamper, Aren Jansen, Sharon Goldwater
Abstract
In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modelling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model that segments unlabelled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation. We report word error rates in a small-vocabulary connected digit recognition task by mapping the unsupervised decoded output to ground truth trans
Authors
(none)
Tags
Stats
Related papers
- An Embedded Segmental K-means Model For Unsupervised Segmentation And Clustering Of Speech (2017)0.00
- Unsupervised Lexicon Learning From Speech Is Limited By Representations Rather Than Clustering (2025)0.00
- Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints In Encoder-decoder Models (2018)0.00
- Unsupervised Word Discovery: Boundary Detection With Clustering Vs. Dynamic Programming (2024)3.58
- Unsupervised Acoustic Unit Discovery By Leveraging A Language-independent Subword Discriminative Feature Representation (2021)5.84
- Unsupervised Word Segmentation From Discrete Speech Units In Low-resource Settings (2021)0.00
- Segmental Audio Word2vec: Representing Utterances As Sequences Of Vectors With Applications In Spoken Term Detection (2018)11.08
- Completely Unsupervised Phoneme Recognition By Adversarially Learning Mapping Relationships From Audio Embeddings (2018)0.00