Efficient Keyword Spotting Using Dilated Convolutions And Gating
2018 Β· Alice Coucke, Mohammed Chlieh, Thibault Gisselbrecht, et al.
Abstract
We explore the application of end-to-end stateless temporal modeling to small-footprint keyword spotting as opposed to recurrent networks that model long-term temporal dependencies using internal states. We propose a model inspired by the recent success of dilated convolutions in sequence modeling applications, allowing to train deeper architectures in resource-constrained configurations. Gated activations and residual connections are also added, following a similar configuration to WaveNet. In addition, we apply a custom target labeling that back-propagates loss from specific frames of interest, therefore yielding higher accuracy and only requiring to detect the end of the keyword. Our experimental results show that our model outperforms a max-pooling loss trained recurrent neural network using LSTM cells, with a significant decrease in false rejection rate. The underlying dataset - "Hey Snips" utterances recorded by over 2.2K different speakers - has been made publicly available to e
Authors
(none)
Tags
Stats
Related papers
- Deep Residual Learning For Small-footprint Keyword Spotting (2017)16.21
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Efficient Keyword Spotting By Capturing Long-range Interactions With Temporal Lambda Networks (2021)0.00
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Small-footprint Keyword Spotting With Graph Convolutional Network (2019)10.48
- Depthwise Separable Convolutional Resnet With Squeeze-and-excitation Blocks For Small-footprint Keyword Spotting (2020)11.29
- End-to-end Streaming Keyword Spotting (2018)12.10
- Separable Temporal Convolution Plus Temporally Pooled Attention For Lightweight High-performance Keyword Spotting (2021)0.00