A Neural Attention Model For Speech Command Recognition
2018 Β· Douglas Coimbra de Andrade, Sabato Leo, Martin Loesener da Silva Viana, et al.
Abstract
This paper introduces a convolutional recurrent network with attention for speech command recognition. Attention models are powerful tools to improve performance on natural language, image captioning and speech tasks. The proposed model establishes a new state-of-the-art accuracy of 94.1% on Google Speech Commands dataset V1 and 94.5% on V2 (for the 20-commands recognition task), while still keeping a small footprint of only 202K trainable parameters. Results are compared with previous convolutional implementations on 5 different tasks (20 commands recognition (V1 and V2), 12 commands recognition (V1), 35 word recognition (V1) and left-right (V1)). We show detailed performance results and demonstrate that the proposed attention mechanism not only improves performance but also allows inspecting what regions of the audio were taken into consideration by the network when outputting a given category.
Authors
(none)
Tags
Stats
Related papers
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- Frequency And Temporal Convolutional Attention For Text-independent Speaker Recognition (2019)0.00
- Attention Based Fully Convolutional Network For Speech Emotion Recognition (2018)15.25
- Convmixer: Feature Interactive Convolution With Curriculum Learning For Small Footprint And Noisy Far-field Keyword Spotting (2022)12.61
- Speech Recognition: Keyword Spotting Through Image Recognition (2018)0.00
- Convolution-based Channel-frequency Attention For Text-independent Speaker Verification (2022)7.50
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Utterance-level End-to-end Language Identification Using Attention-based CNN-BLSTM (2019)11.67