Waveform-based Voice Activity Detection Exploiting Fully Convolutional Networks With Multi-branched Encoders
2020 Β· Cheng Yu, Kuo-Hsuan Hung, I-Fan Lin, et al.
Abstract
In this study, we propose an encoder-decoder structured system with fully convolutional networks to implement voice activity detection (VAD) directly on the time-domain waveform. The proposed system processes the input waveform to identify its segments to be either speech or non-speech. This novel waveform-based VAD algorithm, with a short-hand notation "WVAD", has two main particularities. First, as compared to most conventional VAD systems that use spectral features, raw-waveforms employed in WVAD contain more comprehensive information and thus are supposed to facilitate more accurate speech/non-speech predictions. Second, based on the multi-branched architecture, WVAD can be extended by using an ensemble of encoders, referred to as WEVAD, that incorporate multiple attribute information in utterances, and thus can yield better VAD performance for specified acoustic conditions. We evaluated the presented WVAD and WEVAD for the VAD task in two datasets: First, the experiments conducted
Authors
(none)
Tags
Stats
Related papers
- Adversarial Multi-task Deep Learning For Noise-robust Voice Activity Detection With Low Algorithmic Delay (2022)2.26
- Advancing VAD Systems Based On Multi-task Learning With Improved Model Structures (2023)0.00
- Voice Activity Detection: Merging Source And Filter-based Information (2019)13.50
- X-vector Based Voice Activity Detection For Multi-genre Broadcast Speech-to-text (2021)0.00
- Speech Enhancement Aided End-to-end Multi-task Learning For Voice Activity Detection (2020)11.49
- Self-adaptive Soft Voice Activity Detection Using Deep Neural Networks For Robust Speaker Verification (2019)6.77
- Personal VAD: Speaker-conditioned Voice Activity Detection (2019)13.05
- An Ensemble Svm-based Approach For Voice Activity Detection (2019)5.24