Llm-synth4kws: Scalable Automatic Generation And Synthesis Of Confusable Data For Custom Keyword Spotting
2025 Β· Pai Zhu, Quan Wang, Dhruuv Agarwal, et al.
Abstract
Custom keyword spotting (KWS) allows detecting user-defined spoken keywords from streaming audio. This is achieved by comparing the embeddings from voice enrollments and input audio. State-of-the-art custom KWS models are typically trained contrastively using utterances whose keywords are randomly sampled from training dataset. These KWS models often struggle with confusing keywords, such as "blue" versus "glue". This paper introduces an effective way to augment the training with confusable utterances where keywords are generated and grouped from large language models (LLMs), and speech signals are synthesized with diverse speaking styles from text-to-speech (TTS) engines. To better measure user experience on confusable KWS, we define a new northstar metric using the average area under DET curve from confusable groups (c-AUC). Featuring high scalability and zero labor cost, the proposed method improves AUC by 3.7% and c-AUC by 11.3% on the Speech Commands testing set.
Authors
(none)
Tags
Stats
Related papers
- Synth4kws: Synthesized Speech For User Defined Keyword Spotting In Low Resource Environments (2024)0.00
- Phoneme-level Contrastive Learning For User-defined Keyword Spotting With Flexible Enrollment (2024)6.34
- Contrastive Augmentation: An Unsupervised Learning Approach For Keyword Spotting In Speech Technology (2024)9.92
- GE2E-KWS: Generalized End-to-end Training And Evaluation For Zero-shot Keyword Spotting (2024)2.26
- Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech (2024)0.00
- Training Wake Word Detection With Synthesized Speech Data On Confusion Words (2020)0.00
- VIC-KD: Variance-invariance-covariance Knowledge Distillation To Make Keyword Spotting More Robust Against Adversarial Attacks (2023)2.26
- Exploring Sequence-to-sequence Transformer-transducer Models For Keyword Spotting (2022)5.24