Improving Voice Quality In Speech Anonymization With Just Perception-informed Losses
2024 Β· Suhita Ghosh, Tim Thiele, Frederic Lorbeer, et al.
Abstract
The increasing use of cloud-based speech assistants has heightened the need for effective speech anonymization, which aims to obscure a speaker's identity while retaining critical information for subsequent tasks. One approach to achieving this is through voice conversion. While existing methods often emphasize complex architectures and training techniques, our research underscores the importance of loss functions inspired by the human auditory system. Our proposed loss functions are model-agnostic, incorporating handcrafted and deep learning-based features to effectively capture quality representations. Through objective and subjective evaluations, we demonstrate that a VQVAE-based model, enhanced with our perception-driven losses, surpasses the vanilla model in terms of naturalness, intelligibility, and prosody while maintaining speaker anonymity. These improvements are consistently observed across various datasets, languages, target speakers, and genders.
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Speech Representations Preserve Speech Characteristics While Anonymizing Voices (2022)0.00
- Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding (2024)7.16
- Preserving Spoken Content In Voice Anonymisation With Character-level Vocoder Conditioning (2024)3.58
- Anonymising Elderly And Pathological Speech: Voice Conversion Using DDSP And Query-by-example (2024)4.52
- Privacy-utility Balanced Voice De-identification Using Adversarial Examples (2022)0.00
- Voiceprivacy 2022 System Description: Speaker Anonymization With Feature-matched F0 Trajectories (2022)0.00
- A Speech Representation Anonymization Framework Via Selective Noise Perturbation (2022)6.34
- The Voiceprivacy 2022 Challenge: Progress And Perspectives In Voice Anonymisation (2024)10.61