Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints In Encoder-decoder Models
2018 Β· Herman Kamper
Abstract
We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation. In settings where unlabelled speech is the only available resource, such acoustic word embeddings can form the basis for "zero-resource" speech search, discovery and indexing systems. Most existing unsupervised embedding methods still use some supervision, such as word or phoneme boundaries. Here we propose the encoder-decoder correspondence autoencoder (EncDec-CAE), which, instead of true word segments, uses automatically discovered segments: an unsupervised term discovery system finds pairs of words of the same unknown type, and the EncDec-CAE is trained to reconstruct one word given the other as input. We compare it to a standard encoder-decoder autoencoder (AE), a variational AE with a prior over its latent embedding, and downsampling. EncDec-CAE outperforms its closest competitor by 24% relative in average precision on two languages in a word discrimination tas
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Self-supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings (2020)7.16
- Improved Acoustic Word Embeddings For Zero-resource Languages Using Multilingual Transfer (2020)7.81
- Unsupervised Neural And Bayesian Models For Zero-resource Speech Processing (2017)0.00
- Multilingual Acoustic Word Embedding Models For Processing Zero-resource Languages (2020)8.09
- Unsupervised Word Segmentation And Lexicon Discovery Using Acoustic Word Embeddings (2016)12.10
- Unsupervised Feature Learning For Speech Using Correspondence And Siamese Networks (2020)8.09
- Audio Word2vec: Unsupervised Learning Of Audio Segment Representations Using Sequence-to-sequence Autoencoder (2016)0.00
- Learning Word Embeddings From Speech (2017)0.00