Speech-forensics: Towards Comprehensive Synthetic Speech Dataset Establishment And Analysis
2024 Β· Zhoulin Ji, Chenhao Lin, Hang Wang, et al.
Abstract
Detecting synthetic from real speech is increasingly crucial due to the risks of misinformation and identity impersonation. While various datasets for synthetic speech analysis have been developed, they often focus on specific areas, limiting their utility for comprehensive research. To fill this gap, we propose the Speech-Forensics dataset by extensively covering authentic, synthetic, and partially forged speech samples that include multiple segments synthesized by different high-quality algorithms. Moreover, we propose a TEmporal Speech LocalizaTion network, called TEST, aiming at simultaneously performing authenticity detection, multiple fake segments localization, and synthesis algorithms recognition, without any complex post-processing. TEST effectively integrates LSTM and Transformer to extract more powerful temporal speech representations and utilizes dense prediction on multi-scale pyramid features to estimate the synthetic spans. Our model achieves an average mAP of 83.55% and
Authors
(none)
Tags
Stats
Related papers
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Open Challenges In Synthetic Speech Detection (2022)10.97
- AUDETER: A Large-scale Dataset For Deepfake Audio Detection In Open Worlds (2025)0.00
- Lightweight Model Attribution And Detection Of Synthetic Speech Via Audio Residual Fingerprints (2024)0.00
- Syn-att: Synthetic Speech Attribution Via Semi-supervised Unknown Multi-class Ensemble Of Cnns (2023)0.00
- MLAAD: The Multi-language Audio Anti-spoofing Dataset (2024)13.34
- Safespeech: Robust And Universal Voice Protection Against Malicious Speech Synthesis (2025)0.00
- Detection Of Ai-synthesized Speech Using Cepstral & Bispectral Statistics (2020)0.00