Exploring Sequence-to-sequence Transformer-transducer Models For Keyword Spotting
2022 · Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, et al.
Abstract
In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task. We achieve this by replacing the keyword in the text transcription with a special token <kw> and training the system to detect the <kw> token in an audio stream. At inference time, we create a decision function inspired by conventional KWS approaches, to make our approach more suitable for the KWS task. Furthermore, we introduce a specific keyword spotting loss by adapting the sequence-discriminative Minimum Bayes-Risk training technique. We find that our approach significantly outperforms ASR based KWS systems. When compared with a conventional keyword spotting system, our proposal has similar performance while bringing the advantages and flexibility of sequence-to-sequence training. Additionally, when combined with the conventional KWS system, our approach can improve the performance at any operation point.
Authors
(none)
Tags
Stats
Related papers
- Sequence Discriminative Training For Deep Learning Based Acoustic Keyword Spotting (2018)8.35
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Keyword Transformer: A Self-attention Model For Keyword Spotting (2021)15.31
- Contrastive Augmentation: An Unsupervised Learning Approach For Keyword Spotting In Speech Technology (2024)9.92
- GE2E-KWS: Generalized End-to-end Training And Evaluation For Zero-shot Keyword Spotting (2024)2.26
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Ctc-aligned Audio-text Embedding For Streaming Open-vocabulary Keyword Spotting (2024)3.58
- Llm-synth4kws: Scalable Automatic Generation And Synthesis Of Confusable Data For Custom Keyword Spotting (2025)2.26