Distinguishing Neural Speech Synthesis Models Through Fingerprints In Speech Waveforms
2023 Β· Chu Yuan Zhang, Jiangyan Yi, Jianhua Tao, et al.
Abstract
Recent strides in neural speech synthesis technologies, while enjoying widespread applications, have nonetheless introduced a series of challenges, spurring interest in the defence against the threat of misuse and abuse. Notably, source attribution of synthesized speech has value in forensics and intellectual property protection, but prior work in this area has certain limitations in scope. To address the gaps, we present our findings concerning the identification of the sources of synthesized speech in this paper. We investigate the existence of speech synthesis model fingerprints in the generated speech waveforms, with a focus on the acoustic model and the vocoder, and study the influence of each component on the fingerprint in the overall speech waveforms. Our research, conducted using the multi-speaker LibriTTS dataset, demonstrates two key insights: (1) vocoders and acoustic models impart distinct, model-specific fingerprints on the waveforms they generate, and (2) vocoder fingerp
Authors
(none)
Tags
Stats
Related papers
- Lightweight Model Attribution And Detection Of Synthetic Speech Via Audio Residual Fingerprints (2024)0.00
- Evince The Artifacts Of Spoof Speech By Blending Vocal Tract And Voice Source Features (2022)0.00
- The Sound Of Silence: Efficiency Of First Digit Features In Synthetic Audio Detection (2022)7.50
- Collaborative Watermarking For Adversarial Speech Synthesis (2023)0.00
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76
- Speaker Anonymization Using X-vector And Neural Waveform Models (2019)0.00
- Detection Of Doctored Speech: Towards An End-to-end Parametric Learn-able Filter Approach (2022)0.00
- Evaluation Of The Speech Resynthesis Capabilities Of The Voiceprivacy Challenge Baseline B1 (2023)3.58