Vulnerability Of Automatic Identity Recognition To Audio-visual Deepfakes
2023 Β· Pavel Korshunov, Haolin Chen, Philip N. Garner, et al.
Abstract
The task of deepfakes detection is far from being solved by speech or vision researchers. Several publicly available databases of fake synthetic video and speech were built to aid the development of detection methods. However, existing databases typically focus on visual or voice modalities and provide no proof that their deepfakes can in fact impersonate any real person. In this paper, we present the first realistic audio-visual database of deepfakes SWAN-DF, where lips and speech are well synchronized and video have high visual and audio qualities. We took the publicly available SWAN dataset of real videos with different identities to create audio-visual deepfakes using several models from DeepFaceLab and blending techniques for face swapping and HiFiVC, DiffVC, YourTTS, and FreeVC models for voice conversion. From the publicly available speech dataset LibriTTS, we also created a separate database of only audio deepfakes LibriTTS-DF using several latest text to speech methods: YourTT
Authors
(none)
Tags
Stats
Related papers
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- MFAAN: Unveiling Audio Deepfakes With A Multi-feature Authenticity Network (2023)7.81
- Adversarial Attacks On Audio Deepfake Detection: A Benchmark And Comparative Study (2025)0.00
- AUDETER: A Large-scale Dataset For Deepfake Audio Detection In Open Worlds (2025)0.00
- MLAAD: The Multi-language Audio Anti-spoofing Dataset (2024)13.34
- A Survey On Speech Deepfake Detection (2024)12.10
- Avtenet: A Human-cognition-inspired Audio-visual Transformer-based Ensemble Network For Video Deepfake Detection (2023)7.50