Abstract

Speech synthesis technology has brought great convenience, while the widespread usage of realistic deepfake audio has triggered hazards. Malicious adversaries may unauthorizedly collect victims' speeches and clone a similar voice for illegal exploitation (\textit\{e.g.\}, telecom fraud). However, the existing defense methods cannot effectively prevent deepfake exploitation and are vulnerable to robust training techniques. Therefore, a more effective and robust data protection method is urgently needed. In response, we propose a defensive framework, \textit\{\textbf\{SafeSpeech\}\}, which protects the users' audio before uploading by embedding imperceptible perturbations on original speeches to prevent high-quality synthetic speech. In SafeSpeech, we devise a robust and universal proactive protection technique, \textbf\{S\}peech \textbf\{PE\}rturbative \textbf\{C\}oncealment (\textbf\{SPEC\}), that leverages a surrogate model to generate universally applicable perturbation for generativ

Authors

(none)

Tags

  • Text-to-Speech

Stats

Related papers