Streaming Keyword Spotting Boosted By Cross-layer Discrimination Consistency
2024 Β· Yu Xi, Haoyu Li, Xiaoyu Gu, et al.
Abstract
Connectionist Temporal Classification (CTC), a non-autoregressive training criterion, is widely used in online keyword spotting (KWS). However, existing CTC-based KWS decoding strategies either rely on Automatic Speech Recognition (ASR), which performs suboptimally due to its broad search over the acoustic space without keyword-specific optimization, or on KWS-specific decoding graphs, which are complex to implement and maintain. In this work, we propose a streaming decoding algorithm enhanced by Cross-layer Discrimination Consistency (CDC), tailored for CTC-based KWS. Specifically, we introduce a streamlined yet effective decoding algorithm capable of detecting the start of the keyword at any arbitrary position. Furthermore, we leverage discrimination consistency information across layers to better differentiate between positive and false alarm samples. Our experiments on both clean and noisy Hey Snips datasets show that the proposed streaming decoding strategy outperforms ASR-based a
Authors
(none)
Tags
Stats
Related papers
- Ctc-aligned Audio-text Embedding For Streaming Open-vocabulary Keyword Spotting (2024)3.58
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Masked Self-distilled Transducer-based Keyword Spotting With Semi-autoregressive Decoding (2025)2.26
- DCCRN-KWS: An Audio Bias Based Model For Noise Robust Small-footprint Keyword Spotting (2023)5.24
- Online Continual Learning In Keyword Spotting For Low-resource Devices Via Pooling High-order Temporal Statistics (2023)7.50
- Fast Context-biasing For CTC And Transducer ASR Models With Ctc-based Word Spotter (2024)2.26
- An Investigation Of Enhancing CTC Model For Triggered Attention-based Streaming ASR (2021)0.00