Benchmarking Green AI Methods for Audio Deepfake Detection:A Comparative Study of Efficiency and Accuracy

Abstract

Audio deepfake detection has emerged as a critical challenge in AI security, driven by the rapid proliferation of advanced voice synthesis and voice conversion technologies. State-of-the-art detectors deliver impressive accuracy but impose substantial computational and environmental costs. Green AI offers a compelling alternative by leveraging frozen, pre-trained self-supervised learning (SSL) models as feature extractors paired with lightweight classical machine learning classifiers — enabling CPU-only training and inference. This paper presents a systematic benchmarking study of existing Green AI approaches for audio deepfake detection, evaluating multiple SSL front-ends (wav2vec 2.0, WavLM, HuBERT) in conjunction with multiple classical ML back-ends (SVM-RBF, Logistic Regression, MLP) across two benchmark datasets — ASVspoof 2019 LA and ASVspoof 2021 DF. Beyond accuracy (measured by Equal Error Rate), we introduce a multi-dimensional efficiency analysis encompassing trainable parameter count, training time, inference time, estimated energy consumption, and approximate CO2 emissions. Our results demonstrate that SSL(wav2vec 2.0, Layer 9) + SVM-RBF achieves the best Green AI accuracy with an EER of 0.90% on ASVspoof 2019 LA using fewer than 1,000 trainable parameters, training in under 3 minutes on a standard CPU.

Abstract

Related papers