Adversarial Multi-task Deep Learning For Noise-robust Voice Activity Detection With Low Algorithmic Delay
2022 Β· Claus Meyer Larsen, Peter Koch, Zheng-Hua Tan
Abstract
Voice Activity Detection (VAD) is an important pre-processing step in a wide variety of speech processing systems. VAD should in a practical application be able to detect speech in both noisy and noise-free environments, while not introducing significant latency. In this work we propose using an adversarial multi-task learning method when training a supervised VAD. The method has been applied to the state-of-the-art VAD Waveform-based Voice Activity Detection. Additionally the performance of the VADis investigated under different algorithmic delays, which is an important factor in latency. Introducing adversarial multi-task learning to the model is observed to increase performance in terms of Area Under Curve (AUC), particularly in noisy environments, while the performance is not degraded at higher SNR levels. The adversarial multi-task learning is only applied in the training phase and thus introduces no additional cost in testing. Furthermore the correlation between performance and a
Authors
(none)
Tags
Stats
Related papers
- Speech Enhancement Aided End-to-end Multi-task Learning For Voice Activity Detection (2020)11.49
- Advancing VAD Systems Based On Multi-task Learning With Improved Model Structures (2023)0.00
- Semantic VAD: Low-latency Voice Activity Detection For Speech Interaction (2023)6.34
- Noise-robust Target-speaker Voice Activity Detection Through Self-supervised Pretraining (2025)0.00
- Incorporating VAD Into ASR System By Multi-task Learning (2021)4.52
- MLNET: An Adaptive Multiple Receptive-field Attention Neural Network For Voice Activity Detection (2020)3.58
- Self-supervised Pretraining For Robust Personalized Voice Activity Detection In Adverse Conditions (2023)6.34
- Channel-combination Algorithms For Robust Distant Voice Activity And Overlapped Speech Detection (2024)6.34