Sampling Strategies In Siamese Networks For Unsupervised Speech Representation Learning
2018 Β· Rachid Riad, Corentin Dancette, Julien Karadayi, et al.
Abstract
Recent studies have investigated siamese network architectures for learning invariant speech representations using same-different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the distribution of speakers and the proportions of same and different pairs of words significantly impact the performance of the network. In particular, we show that word frequency compression improves learning across a large range of variations in number of training pairs. This effect does not apply to the same extent to the fully unsupervised setting, where the pairs of same-different words are obtained by spoken term discovery. We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning usin
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Feature Learning For Speech Using Correspondence And Siamese Networks (2020)8.09
- Self-supervised Speaker Verification With Simple Siamese Network And Self-supervised Regularization (2021)10.85
- A CTC Triggered Siamese Network With Spatial-temporal Dropout For Speech Recognition (2022)0.00
- Unsupervised Spoken Term Discovery Based On Re-clustering Of Hypothesized Speech Segments With Siamese And Triplet Networks (2020)0.00
- Hypergraph Based Semi-supervised Learning Algorithms Applied To Speech Recognition Problem: A Novel Approach (2018)0.00
- Similarity Analysis Of Self-supervised Speech Representations (2020)10.07
- Few-shot Learning In Emotion Recognition Of Spontaneous Speech Using A Siamese Neural Network With Adaptive Sample Pair Formation (2021)9.92
- Siamese Neural Network With Joint Bayesian Model Structure For Speaker Verification (2021)0.00