Preserving Spoken Content In Voice Anonymisation With Character-level Vocoder Conditioning
2024 Β· Michele Panariello, Massimiliano Todisco, Nicholas Evans
Abstract
Voice anonymisation can be used to help protect speaker privacy when speech data is shared with untrusted others. In most practical applications, while the voice identity should be sanitised, other attributes such as the spoken content should be preserved. There is always a trade-off; all approaches reported thus far sacrifice spoken content for anonymisation performance. We report what is, to the best of our knowledge, the first attempt to actively preserve spoken content in voice anonymisation. We show how the output of an auxiliary automatic speech recognition model can be used to condition the vocoder module of an anonymisation system using a set of learnable embedding dictionaries in order to preserve spoken content. Relative to a baseline approach, and for only a modest cost in anonymisation performance, the technique is successful in decreasing the word error rate computed from anonymised utterances by almost 60%.
Authors
(none)
Tags
Stats
Related papers
- Self-supervised Speech Representations Preserve Speech Characteristics While Anonymizing Voices (2022)0.00
- Improving Voice Quality In Speech Anonymization With Just Perception-informed Losses (2024)0.00
- Vocoder Drift Compensation By X-vector Alignment In Speaker Anonymisation (2023)0.00
- Analyzing Language-independent Speaker Anonymization Framework Under Unseen Conditions (2022)8.09
- Voiceprivacy 2022 System Description: Speaker Anonymization With Feature-matched F0 Trajectories (2022)0.00
- Speaker Anonymization Using Neural Audio Codec Language Models (2023)10.97
- Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding (2024)7.16
- Evaluation Of The Speech Resynthesis Capabilities Of The Voiceprivacy Challenge Baseline B1 (2023)3.58