Contrastive Augmentation: An Unsupervised Learning Approach For Keyword Spotting In Speech Technology
2024 Β· Weinan Dai, Yifeng Jiang, Yuanjing Liu, et al.
Abstract
This paper addresses the persistent challenge in Keyword Spotting (KWS), a fundamental component in speech technology, regarding the acquisition of substantial labeled data for training. Given the difficulty in obtaining large quantities of positive samples and the laborious process of collecting new target samples when the keyword changes, we introduce a novel approach combining unsupervised contrastive learning and a unique augmentation-based technique. Our method allows the neural network to train on unlabeled data sets, potentially improving performance in downstream tasks with limited labeled data sets. We also propose that similar high-level feature representations should be employed for speech utterances with the same keyword despite variations in speed or volume. To achieve this, we present a speech augmentation-based unsupervised learning method that utilizes the similarity between the bottleneck layer feature and the audio reconstructing information for auxiliary training. Fu
Authors
(none)
Tags
Stats
Related papers
- Phoneme-level Contrastive Learning For User-defined Keyword Spotting With Flexible Enrollment (2024)6.34
- Llm-synth4kws: Scalable Automatic Generation And Synthesis Of Confusable Data For Custom Keyword Spotting (2025)2.26
- Exploring Representation Learning For Small-footprint Keyword Spotting (2023)3.58
- Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech (2024)0.00
- Exploring Sequence-to-sequence Transformer-transducer Models For Keyword Spotting (2022)5.24
- Fully Unsupervised Training Of Few-shot Keyword Spotting (2022)5.24
- Sequence Discriminative Training For Deep Learning Based Acoustic Keyword Spotting (2018)8.35
- A Monaural Speech Enhancement Method For Robust Small-footprint Keyword Spotting (2019)0.00