Training-free Deepfake Voice Recognition By Leveraging Large-scale Pre-trained Models
2024 Β· Alessandro Pianese, Davide Cozzolino, Giovanni Poggi, et al.
Abstract
Generalization is a main issue for current audio deepfake detectors, which struggle to provide reliable results on out-of-distribution data. Given the speed at which more and more accurate synthesis methods are developed, it is very important to design techniques that work well also on data they were not trained for. In this paper we study the potential of large-scale pre-trained models for audio deepfake detection, with special focus on generalization ability. To this end, the detection problem is reformulated in a speaker verification framework and fake audios are exposed by the mismatch between the voice sample under test and the voice of the claimed identity. With this paradigm, no fake speech sample is necessary in training, cutting off any link with the generation method at the root, and ensuring full generalization ability. Features are extracted by general-purpose large pre-trained models, with no need for training or fine-tuning on specific fake detection or speaker verificati
Authors
(none)
Tags
Stats
Related papers
- Towards Generalisable And Calibrated Synthetic Speech Detection With Self-supervised Representations (2023)0.00
- Zero-day Audio Deepfake Detection Via Retrieval Augmentation And Profile Matching (2025)0.00
- Anomaly Detection And Localization For Speech Deepfakes Via Feature Pyramid Matching (2025)4.52
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Continual Learning For Fake Audio Detection (2021)11.49
- Characterizing The Temporal Dynamics Of Universal Speech Representations For Generalizable Deepfake Detection (2023)6.77
- Deepfake Audio As A Data Augmentation Technique For Training Automatic Speech To Text Transcription Models (2023)2.26