Defense Against Synthetic Speech: Real-time Detection Of RVC Voice Conversion Attacks
2025 Β· Prajwal Chinchmalatpure, Suyash Chinchmalatpure, Siddharth Chavan
Abstract
Generative audio technologies now enable highly realistic voice cloning and real-time voice conversion, increasing the risk of impersonation, fraud, and misinformation in communication channels such as phone and video calls. This study investigates real-time detection of AI-generated speech produced using Retrieval-based Voice Conversion (RVC), evaluated on the DEEP-VOICE dataset, which includes authentic and voice-converted speech samples from multiple well-known speakers. To simulate realistic conditions, deepfake generation is applied to isolated vocal components, followed by the reintroduction of background ambiance to suppress trivial artifacts and emphasize conversion-specific cues. We frame detection as a streaming classification task by dividing audio into one-second segments, extracting time-frequency and cepstral features, and training supervised machine learning models to classify each segment as real or voice-converted. The proposed system enables low-latency inference, sup
Authors
(none)
Tags
Stats
Related papers
- Securing Voice-driven Interfaces Against Fake (cloned) Audio Attacks (2019)9.92
- One-class Learning Towards Synthetic Voice Spoofing Detection (2020)17.31
- Vsmask: Defending Against Voice Synthesis Attack Via Real-time Predictive Perturbation (2023)7.81
- Beyond Voice Identity Conversion: Manipulating Voice Attributes By Adversarial Learning Of Structured Disentangled Representations (2021)0.00
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- What To Remember: Self-adaptive Continual Learning For Audio Deepfake Detection (2023)10.48
- Single And Multi-speaker Cloned Voice Detection: From Perceptual To Learned Features (2023)9.23