Anomaly Detection And Localization For Speech Deepfakes Via Feature Pyramid Matching
2025 Β· Emma Coletta, Davide Salvi, Viola Negroni, et al.
Abstract
The rise of AI-driven generative models has enabled the creation of highly realistic speech deepfakes - synthetic audio signals that can imitate target speakers' voices - raising critical security concerns. Existing methods for detecting speech deepfakes primarily rely on supervised learning, which suffers from two critical limitations: limited generalization to unseen synthesis techniques and a lack of explainability. In this paper, we address these issues by introducing a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task. Our model is trained exclusively on real speech to characterize its distribution, enabling the classification of out-of-distribution samples as synthetically generated. Additionally, our framework produces interpretable anomaly maps during inference, highlighting anomalous regions across both time and frequency domains. This is done through a Student-Teacher Feature Pyramid Matching system, enhan
Authors
(none)
Tags
Stats
Related papers
- Multi-modal Deepfake Detection And Localization With Fpn-transformer (2025)2.23
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Training-free Deepfake Voice Recognition By Leveraging Large-scale Pre-trained Models (2024)9.23
- Pitch Imperfect: Detecting Audio Deepfakes Through Acoustic Prosodic Analysis (2025)0.00
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Zero-day Audio Deepfake Detection Via Retrieval Augmentation And Profile Matching (2025)0.00
- Robust Ai-synthesized Speech Detection Using Feature Decomposition Learning And Synthesizer Feature Augmentation (2024)8.35
- Adversarial Attacks On Audio Deepfake Detection: A Benchmark And Comparative Study (2025)0.00