Dual-attention Neural Transducers For Efficient Wake Word Spotting In Speech Recognition
2023 Β· Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, et al.
Abstract
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by \(90%\) for WW audio frames, with only \(1%\) increase in the number of parameters. This architecture improves WW F1 score by \(16%\) relative and improves generic rare word error rate by \(3%\) relative compared to the baselines.
Authors
(none)
Tags
Stats
Related papers
- Optimizing Dysarthria Wake-up Word Spotting: An End-to-end Approach For SLT 2024 LRDWWS Challenge (2024)2.26
- Bifocal Neural ASR: Exploiting Keyword Spotting For Inference Optimization (2021)7.50
- Robust Wake Word Spotting With Frame-level Cross-modal Attention Based Audio-visual Conformer (2024)5.24
- DCCRN-KWS: An Audio Bias Based Model For Noise Robust Small-footprint Keyword Spotting (2023)5.24
- Fast Context-biasing For CTC And Transducer ASR Models With Ctc-based Word Spotter (2024)2.26
- Heimdal: Highly Efficient Method For Detection And Localization Of Wake-words (2022)3.58
- Multi-task Network For Noise-robust Keyword Spotting And Speaker Verification Using Ctc-based Soft VAD And Global Query Attention (2020)9.41
- Lightweight Feature Encoder For Wake-up Word Detection Based On Self-supervised Speech Representation (2023)5.84