Detection Of Ai-synthesized Speech Using Cepstral & Bispectral Statistics
2020 Β· Arun Kumar Singh, Priyanka Singh
Abstract
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.
Authors
(none)
Tags
Stats
Related papers
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Using Deep Learning Techniques And Inferential Speech Statistics For AI Synthesised Speech Recognition (2021)0.00
- Syn-att: Synthetic Speech Attribution Via Semi-supervised Unknown Multi-class Ensemble Of Cnns (2023)0.00
- The Sound Of Silence: Efficiency Of First Digit Features In Synthetic Audio Detection (2022)7.50
- Evince The Artifacts Of Spoof Speech By Blending Vocal Tract And Voice Source Features (2022)0.00
- Robust Ai-synthesized Speech Detection Using Feature Decomposition Learning And Synthesizer Feature Augmentation (2024)8.35
- Lightweight Model Attribution And Detection Of Synthetic Speech Via Audio Residual Fingerprints (2024)0.00
- Speech-forensics: Towards Comprehensive Synthetic Speech Dataset Establishment And Analysis (2024)3.58