Separable Temporal Convolution Plus Temporally Pooled Attention For Lightweight High-performance Keyword Spotting
2021 Β· Shenghua Hu, Jing Wang, Yujun Wang, et al.
Abstract
Keyword spotting (KWS) on mobile devices generally requires a small memory footprint. However, most current models still maintain a large number of parameters in order to ensure good performance. In this paper, we propose a temporally pooled attention module which can capture global features better than the AveragePool. Besides, we design a separable temporal convolution network which leverages depthwise separable and temporal convolution to reduce the number of parameter and calculations. Finally, taking advantage of separable temporal convolution and temporally pooled attention, a efficient neural network (ST-AttNet) is designed for KWS system. We evaluate the models on the publicly available Google speech commands data sets V1. The number of parameters of proposed model (48K) is 1/6 of state-of-the-art TC-ResNet14-1.5 model (305K). The proposed model achieves a 96.6% accuracy, which is comparable to the TC-ResNet14-1.5 model (96.6%).
Authors
(none)
Tags
Stats
Related papers
- A Separable Temporal Convolution Neural Network With Attention For Small-footprint Keyword Spotting (2021)0.00
- Temporal Convolution For Real-time Keyword Spotting On Mobile Devices (2019)15.67
- Small-footprint Keyword Spotting With Multi-scale Temporal Convolution (2020)0.00
- Depthwise Separable Convolutional Resnet With Squeeze-and-excitation Blocks For Small-footprint Keyword Spotting (2020)11.29
- Small-footprint Keyword Spotting With Graph Convolutional Network (2019)10.48
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00
- Online Continual Learning In Keyword Spotting For Low-resource Devices Via Pooling High-order Temporal Statistics (2023)7.50
- Efficient Keyword Spotting By Capturing Long-range Interactions With Temporal Lambda Networks (2021)0.00