Query-by-example Keyword Spotting System Using Multi-head Attention And Softtriple Loss
2021 Β· Jinmiao Huang, Waseem Gharbieh, Han Suk Shim, et al.
Abstract
This paper proposes a neural network architecture for tackling the query-by-example user-defined keyword spotting task. A multi-head attention module is added on top of a multi-layered GRU for effective feature extraction, and a normalized multi-head attention module is proposed for feature aggregation. We also adopt the softtriple loss - a combination of triplet loss and softmax loss - and showcase its effectiveness. We demonstrate the performance of our model on internal datasets with different languages and the public Hey-Snips dataset. We compare the performance of our model to a baseline system and conduct an ablation study to show the benefit of each component in our architecture. The proposed work shows solid performance while preserving simplicity.
Authors
(none)
Tags
Stats
Related papers
- Query-by-example Keyword Spotting Using Spectral-temporal Graph Attentive Pooling And Multi-task Learning (2024)0.00
- Query-by-example Spoken Term Detection Using Attention-based Multi-hop Networks (2017)9.23
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Efficient Keyword Spotting Using Dilated Convolutions And Gating (2018)13.84
- Multi-task Network For Noise-robust Keyword Spotting And Speaker Verification Using Ctc-based Soft VAD And Global Query Attention (2020)9.41
- Phonmatchnet: Phoneme-guided Zero-shot Keyword Spotting For User-defined Keywords (2023)13.34
- Exploring Sequence-to-sequence Transformer-transducer Models For Keyword Spotting (2022)5.24
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40