Slick: Exploiting Subsequences For Length-constrained Keyword Spotting
2024 Β· Kumari Nishu, Minsik Cho, Devang Naik
Abstract
User-defined keyword spotting on a resource-constrained edge device is challenging. However, keywords are often bounded by a maximum keyword length, which has been largely under-leveraged in prior works. Our analysis of keyword-length distribution shows that user-defined keyword spotting can be treated as a length-constrained problem, eliminating the need for aggregation over variable text length. This leads to our proposed method for efficient keyword spotting, SLiCK (exploiting Subsequences for Length-Constrained Keyword spotting). We further introduce a subsequence-level matching scheme to learn audio-text relations at a finer granularity, thus distinguishing similar-sounding keywords more effectively through enhanced context. In SLiCK, the model is trained with a multi-task learning approach using two modules: Matcher (utterance-level matching task, novel subsequence-level matching task) and Encoder (phoneme recognition task). The proposed method improves the baseline results on Li
Authors
(none)
Tags
Stats
Related papers
- Phonmatchnet: Phoneme-guided Zero-shot Keyword Spotting For User-defined Keywords (2023)13.34
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Phoneme-level Contrastive Learning For User-defined Keyword Spotting With Flexible Enrollment (2024)6.34
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Contrastive Augmentation: An Unsupervised Learning Approach For Keyword Spotting In Speech Technology (2024)9.92
- Sequence Discriminative Training For Deep Learning Based Acoustic Keyword Spotting (2018)8.35
- Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech (2024)0.00
- Exploring Representation Learning For Small-footprint Keyword Spotting (2023)3.58