Exploring Representation Learning For Small-footprint Keyword Spotting
2023 Β· Fan Cui, Liyong Guo, Quandong Wang, et al.
Abstract
In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model. First, local-global contrastive siamese networks (LGCSiam) are designed to learn similar utterance-level representations for similar audio samplers by proposed local-global contrastive loss without requiring ground-truth. Second, a self-supervised pretrained Wav2Vec 2.0 model is applied as a constraint module (WVC) to force the KWS model to learn frame-level acoustic representations. By the LGCSiam and WVC modules, the proposed small-footprint KWS model can be pretrained with unlabeled data. Experiments on speech commands dataset show that the self-training WVC module and the self-supervised LGCSiam module significantly improve accuracy, especially in
Authors
(none)
Tags
Stats
Related papers
- Small-footprint Keyword Spotting With Graph Convolutional Network (2019)10.48
- Contrastive Augmentation: An Unsupervised Learning Approach For Keyword Spotting In Speech Technology (2024)9.92
- Phoneme-level Contrastive Learning For User-defined Keyword Spotting With Flexible Enrollment (2024)6.34
- Fully Unsupervised Training Of Few-shot Keyword Spotting (2022)5.24
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00
- A Separable Temporal Convolution Neural Network With Attention For Small-footprint Keyword Spotting (2021)0.00
- Online Continual Learning In Keyword Spotting For Low-resource Devices Via Pooling High-order Temporal Statistics (2023)7.50
- Sequence Discriminative Training For Deep Learning Based Acoustic Keyword Spotting (2018)8.35