Few-shot Open-set Learning For On-device Customization Of Keyword Spotting Systems
2023 Β· Manuele Rusci, Tinne Tuytelaars
Abstract
A personalized KeyWord Spotting (KWS) pipeline typically requires the training of a Deep Learning model on a large set of user-defined speech utterances, preventing fast customization directly applied on-device. To fill this gap, this paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier. With user-defined keywords from 10 classes of the Google Speech Command dataset, our study reports an accuracy of up to 76% in a 10-shot scenario while the false acceptance rate of unknown data is kept to 5%. In the analyzed settings, the usage of the triplet loss to train an encoder with normalized output features performs better than the prototypical networks jointly trained with a generator of dummy unknown-class prototypes. This design is also more effective than encoders trained on a classification problem and features fewer parameters than other iso-accuracy approaches.
Authors
(none)
Tags
Stats
Related papers
- Few-shot Keyword Spotting With Prototypical Networks (2020)10.21
- Boosting Keyword Spotting Through On-device Learnable User Speech Characteristics (2024)0.00
- Fully Unsupervised Training Of Few-shot Keyword Spotting (2022)5.24
- Phoneme-level Contrastive Learning For User-defined Keyword Spotting With Flexible Enrollment (2024)6.34
- Exploring Representation Learning For Small-footprint Keyword Spotting (2023)3.58
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- GE2E-KWS: Generalized End-to-end Training And Evaluation For Zero-shot Keyword Spotting (2024)2.26
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00