Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks
2020 · Théodore Bluche, Maël Primet, Thibault Gisselbrecht
Abstract
We explore a keyword-based spoken language understanding system, in which the intent of the user can directly be derived from the detection of a sequence of keywords in the query. In this paper, we focus on an open-vocabulary keyword spotting method, allowing the user to define their own keywords without having to retrain the whole model. We describe the different design choices leading to a fast and small-footprint system, able to run on tiny devices, for any arbitrary set of user-defined keywords, without training data specific to those keywords. The model, based on a quantized long short-term memory (LSTM) neural network, trained with connectionist temporal classification (CTC), weighs less than 500KB. Our approach takes advantage of some properties of the predictions of CTC-trained networks to calibrate the confidence scores and implement a fast detection algorithm. The proposed system outperforms a standard keyword-filler model approach.
Authors
(none)
Tags
Stats
Related papers
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00
- Efficient Keyword Spotting Using Dilated Convolutions And Gating (2018)13.84
- Small-footprint Keyword Spotting With Graph Convolutional Network (2019)10.48
- End-to-end Streaming Keyword Spotting (2018)12.10
- A Separable Temporal Convolution Neural Network With Attention For Small-footprint Keyword Spotting (2021)0.00
- Open Vocabulary Keyword Spotting Through Transfer Learning From Speech Synthesis (2024)0.00