Straight Through Gumbel Softmax Estimator Based Bimodal Neural Architecture Search For Audio-visual Deepfake Detection
2024 Β· Aravinda Reddy Pn, Raghavendra Ramachandra, Krothapalli Sreenivasa Rao, et al.
Abstract
Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data characteristics and complex patterns. In this paper, we introduce the Straight-through Gumbel-Softmax (STGS) framework, offering a comprehensive approach to search multimodal fusion model architectures. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Initially, crucial features were efficiently identified from backbone networks, whereas within the cell structure, a weighted fusion operation integrated information from various sources. An architecture that maximizes the classification performance is derived by varying parameters such
Authors
(none)
Tags
Stats
Related papers
- ERF-BA-TFD+: A Multimodal Model For Audio-visual Deepfake Detection (2025)2.26
- Avtenet: A Human-cognition-inspired Audio-visual Transformer-based Ensemble Network For Video Deepfake Detection (2023)7.50
- Multi-modal Deepfake Detection And Localization With Fpn-transformer (2025)2.23
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Investigating Self-supervised Representations For Audio-visual Deepfake Detection (2025)0.00
- MFAAN: Unveiling Audio Deepfakes With A Multi-feature Authenticity Network (2023)7.81
- What To Remember: Self-adaptive Continual Learning For Audio Deepfake Detection (2023)10.48
- Mixture Of Experts Fusion For Fake Audio Detection Using Frozen Wav2vec 2.0 (2024)0.00