Self-adaptive Soft Voice Activity Detection Using Deep Neural Networks For Robust Speaker Verification
2019 Β· Youngmoon Jung, Yeunju Choi, Hoirin Kim
Abstract
Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to incorporate a deep neural network (DNN)-based VAD into a deep speaker embedding system. The proposed method is a combination of the following two approaches. The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor. The frame-level features are weighted by their corresponding speech posteriors estimated from the DNN-based VAD, and then aggregated to generate a speaker embedding. The second approach is self-adaptive VAD, which fine-tunes the pre-trained VAD on the speaker verification data to reduce the domain mismatch. Here, we introduce two unsupervised domain adaptation (DA) schemes, namely speech posterior-based DA (SP-DA) and joint learning-based DA (JL-DA). Experiments on
Authors
(none)
Tags
Stats
Related papers
- Personal VAD: Speaker-conditioned Voice Activity Detection (2019)13.05
- A Unified Deep Learning Framework For Short-duration Speaker Verification In Adverse Environments (2020)9.41
- DEAAN: Disentangled Embedding And Adversarial Adaptation Network For Robust Speaker Representation Learning (2020)9.59
- Neural Network Based Speaker Classification And Verification Systems With Enhanced Features (2017)8.60
- MLNET: An Adaptive Multiple Receptive-field Attention Neural Network For Voice Activity Detection (2020)3.58
- Vae-based Domain Adaptation For Speaker Verification (2019)7.50
- Adapting End-to-end Neural Speaker Verification To New Languages And Recording Conditions With Adversarial Training (2018)9.59
- Noise-robust Target-speaker Voice Activity Detection Through Self-supervised Pretraining (2025)0.00