Robust Ai-synthesized Speech Detection Using Feature Decomposition Learning And Synthesizer Feature Augmentation
2024 Β· Kuiyuan Zhang, Zhongyun Hua, Yushu Zhang, et al.
Abstract
AI-synthesized speech, also known as deepfake speech, has recently raised significant concerns due to the rapid advancement of speech synthesis and speech conversion techniques. Previous works often rely on distinguishing synthesizer artifacts to identify deepfake speech. However, excessive reliance on these specific synthesizer artifacts may result in unsatisfactory performance when addressing speech signals created by unseen synthesizers. In this paper, we propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features as complementary for detection. Specifically, we propose a dual-stream feature decomposition learning strategy that decomposes the learned speech representation using a synthesizer stream and a content stream. The synthesizer stream specializes in learning synthesizer features through supervised training with synthesizer labels. Meanwhile, the content stream focuses on learning synthesizer-independe
Authors
(none)
Tags
Stats
Related papers
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Anomaly Detection And Localization For Speech Deepfakes Via Feature Pyramid Matching (2025)4.52
- Detection Of Ai-synthesized Speech Using Cepstral & Bispectral Statistics (2020)0.00
- Lightweight Model Attribution And Detection Of Synthetic Speech Via Audio Residual Fingerprints (2024)0.00
- Self-attention And Hybrid Features For Replay And Deep-fake Audio Detection (2024)0.00
- A Survey On Speech Deepfake Detection (2024)12.10
- Safespeech: Robust And Universal Voice Protection Against Malicious Speech Synthesis (2025)0.00
- Syn-att: Synthetic Speech Attribution Via Semi-supervised Unknown Multi-class Ensemble Of Cnns (2023)0.00