Deep Residual Learning For Small-footprint Keyword Spotting
2017 Β· Raphael Tang, Jimmy Lin
Abstract
We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google's previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models that also outperform previous small-footprint variants. To our knowledge, we are the first to examine these approaches for keyword spotting, and our results establish an open-source state-of-the-art reference to support the development of future speech-based interfaces.
Authors
(none)
Tags
Stats
Related papers
- Depthwise Separable Convolutional Resnet With Squeeze-and-excitation Blocks For Small-footprint Keyword Spotting (2020)11.29
- Small-footprint Keyword Spotting With Graph Convolutional Network (2019)10.48
- Broadcasted Residual Learning For Efficient Keyword Spotting (2021)18.60
- Efficient Keyword Spotting Using Dilated Convolutions And Gating (2018)13.84
- Neural Architecture Search For Keyword Spotting (2020)10.61
- Few-shot Keyword Spotting With Prototypical Networks (2020)10.21
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- A Separable Temporal Convolution Neural Network With Attention For Small-footprint Keyword Spotting (2021)0.00