Speaker Anonymization Using Neural Audio Codec Language Models
2023 Β· Michele Panariello, Francesco Nespoli, Massimiliano Todisco, et al.
Abstract
The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder. Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information. We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022.
Authors
(none)
Tags
Stats
Related papers
- Speaker Anonymization Using X-vector And Neural Waveform Models (2019)0.00
- NPU-NTU System For Voice Privacy 2024 Challenge (2024)7.16
- Language-independent Speaker Anonymization Approach Using Self-supervised Pre-trained Models (2022)9.92
- Voiceprivacy 2022 System Description: Speaker Anonymization With Feature-matched F0 Trajectories (2022)0.00
- Analyzing Language-independent Speaker Anonymization Framework Under Unseen Conditions (2022)8.09
- Speaker Anonymization With Distribution-preserving X-vector Generation For The Voiceprivacy Challenge 2020 (2020)0.00
- Reprogramming Self-supervised Learning-based Speech Representations For Speaker Anonymization (2023)2.26
- Exploring The Importance Of F0 Trajectories For Speaker Anonymization Using X-vectors And Neural Waveform Models (2021)0.00