Open Vocabulary Keyword Spotting Through Transfer Learning From Speech Synthesis
2024 Β· Kesavaraj V, Anil Kumar Vuppala
Abstract
Identifying keywords in an open-vocabulary context is crucial for personalizing interactions with smart devices. Previous approaches to open vocabulary keyword spotting dependon a shared embedding space created by audio and text encoders. However, these approaches suffer from heterogeneous modality representations (i.e., audio-text mismatch). To address this issue, our proposed framework leverages knowledge acquired from a pre-trained text-to-speech (TTS) system. This knowledge transfer allows for the incorporation of awareness of audio projections into the text representations derived from the text encoder. The performance of the proposed approach is compared with various baseline methods across four different datasets. The robustness of our proposed model is evaluated by assessing its performance across different word lengths and in an Out-of-Vocabulary (OOV) scenario. Additionally, the effectiveness of transfer learning from the TTS system is investigated by analyzing its different
Authors
(none)
Tags
Stats
Related papers
- Small-footprint Open-vocabulary Keyword Spotting With Quantized LSTM Networks (2020)0.00
- Using Synthetic Audio To Improve The Recognition Of Out-of-vocabulary Words In End-to-end ASR Systems (2020)12.33
- Synth4kws: Synthesized Speech For User Defined Keyword Spotting In Low Resource Environments (2024)0.00
- Ctc-aligned Audio-text Embedding For Streaming Open-vocabulary Keyword Spotting (2024)3.58
- Few-shot Open-set Learning For On-device Customization Of Keyword Spotting Systems (2023)8.60
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- Online Continual Learning In Keyword Spotting For Low-resource Devices Via Pooling High-order Temporal Statistics (2023)7.50
- Boosting Keyword Spotting Through On-device Learnable User Speech Characteristics (2024)0.00