Abstract

Personalized Voice Activity Detection (PVAD) systems activate only in response to a specific target speaker. Speaker-conditioning methods are employed to inject information about the target speaker into a VAD pipeline, to achieve personalization. Existing speaker-conditioning methods typically modify the inputs or activations of a VAD model. We propose an alternative perspective to speaker conditioning. Our approach, HyWA, employs a hypernetwork to generate personalized weights for a few selected layers of a standard VAD model. We evaluate HyWA against multiple baseline speaker-conditioning techniques using a fixed backbone VAD. Our comparison shows consistent improvements in PVAD performance. This new approach improves the current speaker-conditioning techniques in two ways: i) increases the mean average precision, ii) facilitates deployment by reusing the same VAD architecture.

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keynejad2025hywa

Related papers