Lightweight Feature Encoder For Wake-up Word Detection Based On Self-supervised Speech Representation
2023 Β· Hyungjun Lim, Younggwan Kim, Kiho Yeom, et al.
Abstract
Self-supervised learning method that provides generalized speech representations has recently received increasing attention. Wav2vec 2.0 is the most famous example, showing remarkable performance in numerous downstream speech processing tasks. Despite its success, it is challenging to use it directly for wake-up word detection on mobile devices due to its expensive computational cost. In this work, we propose LiteFEW, a lightweight feature encoder for wake-up word detection that preserves the inherent ability of wav2vec 2.0 with a minimum scale. In the method, the knowledge of the pre-trained wav2vec 2.0 is compressed by introducing an auto-encoder-based dimensionality reduction technique and distilled to LiteFEW. Experimental results on the open-source "Hey Snips" dataset show that the proposed method applied to various model structures significantly improves the performance, achieving over 20% of relative improvements with only 64k parameters.
Authors
(none)
Tags
Stats
Related papers
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Unsupervised Speech Representation Learning Using Wavenet Autoencoders (2019)17.21
- Exploring Wav2vec 2.0 On Speaker Verification And Language Identification (2020)15.59
- Federated Learning For Keyword Spotting (2018)17.09
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations (2019)0.00
- Unsupervised Speech Recognition (2021)0.00