Rvad: An Unsupervised Segment-based Robust Voice Activity Detection Method
2019 Β· Zheng-Hua Tan, Achintya Kr. Sarkar, Najim Dehak
Abstract
This paper presents an unsupervised segment-based method for robust voice activity detection (rVAD). The method consists of two passes of denoising followed by a voice activity detection (VAD) stage. In the first pass, high-energy segments in a speech signal are detected by using a posteriori signal-to-noise ratio (SNR) weighted energy difference and if no pitch is detected within a segment, the segment is considered as a high-energy noise segment and set to zero. In the second pass, the speech signal is denoised by a speech enhancement method, for which several methods are explored. Next, neighbouring frames with pitch are grouped together to form pitch segments, and based on speech statistics, the pitch segments are further extended from both ends in order to include both voiced and unvoiced sounds and likely non-speech parts as well. In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice act
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Voice Activity Detection By Modeling Source And System Information Using Zero Frequency Filtering (2022)6.34
- Voice Activity Detection: Merging Source And Filter-based Information (2019)13.50
- Semantic VAD: Low-latency Voice Activity Detection For Speech Interaction (2023)6.34
- Speech Enhancement Aided End-to-end Multi-task Learning For Voice Activity Detection (2020)11.49
- Self-adaptive Soft Voice Activity Detection Using Deep Neural Networks For Robust Speaker Verification (2019)6.77
- Adversarial Multi-task Deep Learning For Noise-robust Voice Activity Detection With Low Algorithmic Delay (2022)2.26
- An Ensemble Svm-based Approach For Voice Activity Detection (2019)5.24
- Personal VAD: Speaker-conditioned Voice Activity Detection (2019)13.05