Boosting Keyword Spotting Through On-device Learnable User Speech Characteristics
2024 Β· Cristian Cioflan, Lukas Cavigelli, Luca Benini
Abstract
Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally intensive and memory-hungry backbone update schemes, unfit for always-on, battery-powered devices. In this work, we propose a novel on-device learning architecture, composed of a pretrained backbone and a user-aware embedding learning the user's speech characteristics. The so-generated features are fused and used to classify the input utterance. For domain shifts generated by unseen speakers, we measure error rate reductions of up to 19% from 30.1% to 24.3% based on the 35-class problem of the Google Speech Commands dataset, through the inexpensive update of the user projections. We moreover demo
Authors
(none)
Tags
Stats
Related papers
- Few-shot Open-set Learning For On-device Customization Of Keyword Spotting Systems (2023)8.60
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Autokws: Keyword Spotting With Differentiable Architecture Search (2020)9.92
- Online Continual Learning In Keyword Spotting For Low-resource Devices Via Pooling High-order Temporal Statistics (2023)7.50
- Tinysv: Speaker Verification In Tinyml With On-device Learning (2024)5.84
- A 14uj/decision Keyword Spotting Accelerator With In-sram-computing And On Chip Learning For Customization (2022)5.24
- Broadcasted Residual Learning For Efficient Keyword Spotting (2021)18.60