ERF-BA-TFD+: A Multimodal Model For Audio-visual Deepfake Detection
2025 Β· Xin Zhang, Jiaming Chu, Jian Zhao, et al.
Abstract
Deepfake detection is a critical task in identifying manipulated multimedia content. In real-world scenarios, deepfake content can manifest across multiple modalities, including audio and video. To address this challenge, we present ERF-BA-TFD+, a novel multimodal deepfake detection model that combines enhanced receptive field (ERF) and audio-visual fusion. Our model processes both audio and video features simultaneously, leveraging their complementary information to improve detection accuracy and robustness. The key innovation of ERF-BA-TFD+ lies in its ability to model long-range dependencies within the audio-visual input, allowing it to better capture subtle discrepancies between real and fake content. In our experiments, we evaluate ERF-BA-TFD+ on the DDL-AV dataset, which consists of both segmented and full-length video clips. Unlike previous benchmarks, which focused primarily on isolated segments, the DDL-AV dataset allows us to assess the model's performance in a more comprehen
Authors
(none)
Tags
Stats
Related papers
- Avtenet: A Human-cognition-inspired Audio-visual Transformer-based Ensemble Network For Video Deepfake Detection (2023)7.50
- Investigating Self-supervised Representations For Audio-visual Deepfake Detection (2025)0.00
- Multi-modal Deepfake Detection And Localization With Fpn-transformer (2025)2.23
- Straight Through Gumbel Softmax Estimator Based Bimodal Neural Architecture Search For Audio-visual Deepfake Detection (2024)5.84
- FADEL: Uncertainty-aware Fake Audio Detection With Evidential Deep Learning (2025)0.00
- MFAAN: Unveiling Audio Deepfakes With A Multi-feature Authenticity Network (2023)7.81
- Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm With Real Emphasis And Fake Dispersion Strategy (2024)5.84
- Zero-day Audio Deepfake Detection Via Retrieval Augmentation And Profile Matching (2025)0.00