An End-to-end Architecture For Keyword Spotting And Voice Activity Detection
2016 Β· Chris Lengerich, Awni Hannun
Abstract
We propose a single neural network architecture for two tasks: on-line keyword spotting and voice activity detection. We develop novel inference algorithms for an end-to-end Recurrent Neural Network trained with the Connectionist Temporal Classification loss function which allow our model to achieve high accuracy on both keyword spotting and voice activity detection without retraining. In contrast to prior voice activity detection models, our architecture does not require aligned training data and uses the same parameters as the keyword spotting model. This allows us to deploy a high quality voice activity detector with no additional memory or maintenance requirements.
Authors
(none)
Tags
Stats
Related papers
- Predicting Detection Filters For Small Footprint Open-vocabulary Keyword Spotting (2019)9.92
- Neural Architecture Search For Keyword Spotting (2020)10.61
- End-to-end Streaming Keyword Spotting (2018)12.10
- Efficient Keyword Spotting Using Time Delay Neural Networks (2018)10.21
- Multi-task Network For Noise-robust Keyword Spotting And Speaker Verification Using Ctc-based Soft VAD And Global Query Attention (2020)9.41
- Efficient Keyword Spotting Using Dilated Convolutions And Gating (2018)13.84
- Streaming Small-footprint Keyword Spotting Using Sequence-to-sequence Models (2017)12.40
- Autokws: Keyword Spotting With Differentiable Architecture Search (2020)9.92