Multi-task Network For Noise-robust Keyword Spotting And Speaker Verification Using Ctc-based Soft VAD And Global Query Attention
2020 Β· Myunghun Jung, Youngmoon Jung, Jahyun Goo, et al.
Abstract
Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully utilize the interrelated domain information. The multi-task network tightly combines sub-networks aiming at performance improvement in challenging conditions such as noisy environments, open-vocabulary KWS, and short-duration SV, by introducing novel techniques of connectionist temporal classification (CTC)-based soft voice activity detection (VAD) and global query attention. Frame-level acoustic and speaker information is integrated with phonetically originated weights so that forms a word-level global representation. Then it is used for the aggregation of feature vectors to generate discriminative embeddings. Our proposed approach shows 4.06% and 26.71% relative improvements in equal error rate (EER) compared to the baselines for bot
Authors
(none)
Tags
Stats
Related papers
- Multi-task Voice Activated Framework Using Self-supervised Learning (2021)6.34
- Frequency And Multi-scale Selective Kernel Attention For Speaker Verification (2022)10.07
- Separable Temporal Convolution Plus Temporally Pooled Attention For Lightweight High-performance Keyword Spotting (2021)0.00
- DCCRN-KWS: An Audio Bias Based Model For Noise Robust Small-footprint Keyword Spotting (2023)5.24
- Aca-net: Towards Lightweight Speaker Verification Using Asymmetric Cross Attention (2023)0.00
- An End-to-end Architecture For Keyword Spotting And Voice Activity Detection (2016)0.00
- Query-by-example Keyword Spotting Using Spectral-temporal Graph Attentive Pooling And Multi-task Learning (2024)0.00
- Small-footprint Keyword Spotting Using Deep Neural Network And Connectionist Temporal Classifier (2017)0.00