SLIM: Style-linguistics Mismatch Model For Generalized Audio Deepfake Detection
2024 Β· Yi Zhu, Surya Koppisetti, Trang Tran, et al.
Abstract
Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized from generative AI models. Existing ADD models suffer from generalization issues, with a large performance discrepancy between in-domain and out-of-domain data. Moreover, the black-box nature of existing models limits their use in real-world scenarios, where explanations are required for model decisions. To alleviate these issues, we introduce a new ADD model that explicitly uses the StyleLInguistics Mismatch (SLIM) in fake speech to separate them from real speech. SLIM first employs self-supervised pretraining on only real samples to learn the style-linguistics dependency in the real class. The learned features are then used in complement with standard pretrained acoustic features (e.g., Wav2vec) to learn a classifier on the real and fake classes. When the feature encoders are frozen, SLIM outperforms benchmark methods on out-of-domain datasets while achieving competitive results on in-domain data. Th
Authors
(none)
Tags
Stats
Related papers
- Transsionadd: A Multi-frame Reinforcement Based Sequence Tagging Model For Audio Deepfake Detection (2023)0.00
- Heterogeneity Over Homogeneity: Investigating Multilingual Speech Pre-trained Models For Detecting Audio Deepfake (2024)8.09
- The Codecfake Dataset And Countermeasures For The Universally Detection Of Deepfake Audio (2024)10.97
- Adversarial Attacks On Audio Deepfake Detection: A Benchmark And Comparative Study (2025)0.00
- Betray Oneself: A Novel Audio Deepfake Detection Model Via Mono-to-stereo Conversion (2023)10.04
- Pitch Imperfect: Detecting Audio Deepfakes Through Acoustic Prosodic Analysis (2025)0.00
- AUDETER: A Large-scale Dataset For Deepfake Audio Detection In Open Worlds (2025)0.00
- MLAAD: The Multi-language Audio Anti-spoofing Dataset (2024)13.34