Improved Audio Embeddings By Adjacency-based Clustering With Applications In Spoken Term Detection
2018 Β· Sung-Feng Huang, Yi-Chen Chen, Hung-Yi Lee, et al.
Abstract
Embedding audio signal segments into vectors with fixed dimensionality is attractive because all following processing will be easier and more efficient, for example modeling, classifying or indexing. Audio Word2Vec previously proposed was shown to be able to represent audio segments for spoken words as such vectors carrying information about the phonetic structures of the signal segments. However, each linguistic unit (word, syllable, phoneme in text form) corresponds to unlimited number of audio segments with vector representations inevitably spread over the embedding space, which causes some confusion. It is therefore desired to better cluster the audio embeddings such that those corresponding to the same linguistic unit can be more compactly distributed. In this paper, inspired by Siamese networks, we propose some approaches to achieve the above goal. This includes identifying positive and negative pairs from unlabeled data for Siamese style training, disentangling acoustic factors
Authors
(none)
Tags
Stats
Related papers
- Segmental Audio Word2vec: Representing Utterances As Sequences Of Vectors With Applications In Spoken Term Detection (2018)11.08
- Learning Word Embeddings From Speech (2017)0.00
- Phonetic-and-semantic Embedding Of Spoken Words With Applications In Spoken Content Retrieval (2018)9.76
- Audio Word2vec: Unsupervised Learning Of Audio Segment Representations Using Sequence-to-sequence Autoencoder (2016)0.00
- Additional Shared Decoder On Siamese Multi-view Encoders For Learning Acoustic Word Embeddings (2019)6.34
- Unsupervised Spoken Term Discovery Based On Re-clustering Of Hypothesized Speech Segments With Siamese And Triplet Networks (2020)0.00
- Speaker Diarisation Using 2D Self-attentive Combination Of Embeddings (2019)9.92
- Discriminative Acoustic Word Embeddings: Recurrent Neural Network-based Approaches (2016)0.00