Unsupervised Spoken Term Discovery Based On Re-clustering Of Hypothesized Speech Segments With Siamese And Triplet Networks
2020 Β· Man-Ling Sung, Tan Lee
Abstract
Spoken term discovery from untranscribed speech audio could be achieved via a two-stage process. In the first stage, the unlabelled speech is decoded into a sequence of subword units that are learned and modelled in an unsupervised manner. In the second stage, partial sequence matching and clustering are performed on the decoded subword sequences, resulting in a set of discovered words or phrases. A limitation of this approach is that the results of subword decoding could be erroneous, and the errors would impact the subsequent steps. While Siamese/Triplet network is one approach to learn segment representations that can improve the discovery process, the challenge in spoken term discovery under a complete unsupervised scenario is that training examples are unavailable. In this paper, we propose to generate training examples from initial hypothesized sequence clusters. The Siamese/Triplet network is trained on the hypothesized examples to measure the similarity between two speech segme
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Feature Learning For Speech Using Correspondence And Siamese Networks (2020)8.09
- Improved Audio Embeddings By Adjacency-based Clustering With Applications In Spoken Term Detection (2018)0.00
- Unsupervised Word Segmentation And Lexicon Discovery Using Acoustic Word Embeddings (2016)12.10
- Sampling Strategies In Siamese Networks For Unsupervised Speech Representation Learning (2018)8.35
- Unsupervised Word Discovery: Boundary Detection With Clustering Vs. Dynamic Programming (2024)3.58
- A CTC Triggered Siamese Network With Spatial-temporal Dropout For Speech Recognition (2022)0.00
- An Embedded Segmental K-means Model For Unsupervised Segmentation And Clustering Of Speech (2017)0.00
- A Nonparametric Bayesian Approach For Spoken Term Detection By Example Query (2016)0.00