Speech Foundation Model Ensembles For The Controlled Singing Voice Deepfake Detection (ctrsvdd) Challenge 2024
2024 Β· Anmol Guragain, Tianchi Liu, Zihan Pan, et al.
Abstract
This work details our approach to achieving a leading system with a 1.79% pooled equal error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection (CtrSVDD). The rapid advancement of generative AI models presents significant challenges for detecting AI-generated deepfake singing voices, attracting increased research attention. The Singing Voice Deepfake Detection (SVDD) Challenge 2024 aims to address this complex task. In this work, we explore the ensemble methods, utilizing speech foundation models to develop robust singing voice anti-spoofing systems. We also introduce a novel Squeeze-and-Excitation Aggregation (SEA) method, which efficiently and effectively integrates representation features from the speech foundation models, surpassing the performance of our other individual systems. Evaluation results confirm the efficacy of our approach in detecting deepfake singing voices. The codes can be accessed at https://github.com/Anmol2059/SVDD2024.
Authors
(none)
Tags
Stats
Code
Related papers
- Ctrsvdd: A Benchmark Dataset And Baseline Analysis For Controlled Singing Voice Deepfake Detection (2024)0.00
- SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan (2024)0.00
- Singfake: Singing Voice Deepfake Detection (2023)11.93
- Asasvicomtech: The Vicomtech-ugr Speech Deepfake Detection And SASV Systems For The Asvspoof5 Challenge (2024)5.24
- Vits-based Singing Voice Conversion System With DSPGAN Post-processing For SVCC2023 (2023)5.84
- Exploring Wavlm Back-ends For Speech Spoofing And Deepfake Detection (2024)4.52
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52