Hywa: Hypernetwork Weight Adapting Personalized Voice Activity Detection
2025 Β· Mahsa Ghazvini Nejad, Hamed Jafarzadeh Asl, Amin Edraki, et al.
Abstract
Personalized Voice Activity Detection (PVAD) systems activate only in response to a specific target speaker. Speaker-conditioning methods are employed to inject information about the target speaker into a VAD pipeline, to achieve personalization. Existing speaker-conditioning methods typically modify the inputs or activations of a VAD model. We propose an alternative perspective to speaker conditioning. Our approach, HyWA, employs a hypernetwork to generate personalized weights for a few selected layers of a standard VAD model. We evaluate HyWA against multiple baseline speaker-conditioning techniques using a fixed backbone VAD. Our comparison shows consistent improvements in PVAD performance. This new approach improves the current speaker-conditioning techniques in two ways: i) increases the mean average precision, ii) facilitates deployment by reusing the same VAD architecture.
Authors
(none)
Tags
Stats
Related papers
- Personal VAD: Speaker-conditioned Voice Activity Detection (2019)13.05
- Comparative Analysis Of Personalized Voice Activity Detection Systems: Assessing Real-world Effectiveness (2024)0.00
- VACE-WPE: Virtual Acoustic Channel Expansion Based On Neural Networks For Weighted Prediction Error-based Speech Dereverberation (2021)3.58
- Waveform-based Voice Activity Detection Exploiting Fully Convolutional Networks With Multi-branched Encoders (2020)0.00
- Self-adaptive Soft Voice Activity Detection Using Deep Neural Networks For Robust Speaker Verification (2019)6.77
- Adversarial Multi-task Deep Learning For Noise-robust Voice Activity Detection With Low Algorithmic Delay (2022)2.26
- Self-supervised Pretraining For Robust Personalized Voice Activity Detection In Adverse Conditions (2023)6.34
- Advancing VAD Systems Based On Multi-task Learning With Improved Model Structures (2023)0.00