Mixture Of Experts Fusion For Fake Audio Detection Using Frozen Wav2vec 2.0
2024 Β· Zhiyong Wang, Ruibo Fu, Zhengqi Wen, et al.
Abstract
Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enhances detection performance. However, most of the previously proposed fusion methods require fine-tuning the pretrained models, resulting in excessively long training times and hindering model iteration when facing new speech synthesis technology. To address this issue, this paper proposes a feature fusion method based on the Mixture of Experts, which extracts and integrates features relevant to fake audio detection from layer features, guided by a gating network based on the last layer feature, while freezing the pretrained model. Experiments conducted on the ASVspoof2019 and ASVspoof2021 datasets demonstrate that the proposed method achieves competitive performance compared to those requiring fine-tuning.
Authors
(none)
Tags
Stats
Related papers
- Experimental Study: Enhancing Voice Spoofing Detection Models With Wav2vec 2.0 (2024)0.00
- Automatic Speaker Verification Spoofing And Deepfake Detection Using Wav2vec 2.0 And Data Augmentation (2022)17.35
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Continual Learning For Fake Audio Detection (2021)11.49
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Representation Selective Self-distillation And Wav2vec 2.0 Feature Exploration For Spoof-aware Speaker Verification (2022)9.03
- FADEL: Uncertainty-aware Fake Audio Detection With Evidential Deep Learning (2025)0.00
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03