Spoofing Speaker Verification Systems With Deep Multi-speaker Text-to-speech Synthesis
2019 Β· Mingrui Yuan, Zhiyao Duan
Abstract
This paper proposes a deep multi-speaker text-to-speech (TTS) model for spoofing speaker verification (SV) systems. The proposed model employs one network to synthesize time-downsampled mel-spectrograms from text input and another network to convert them to linear-frequency spectrograms, which are further converted to the time domain using the Griffin-Lim algorithm. Both networks are trained separately under the generative adversarial networks (GAN) framework. Spoofing experiments on two state-of-the-art SV systems (i-vectors and Google's GE2E) show that the proposed system can successfully spoof these systems with a high success rate. Spoofing experiments on anti-spoofing systems (i.e., binary classifiers for discriminating real and synthetic speech) also show a high spoof success rate when such anti-spoofing systems' structures are exposed to the proposed TTS system.
Authors
(none)
Tags
Stats
Related papers
- Transforming Acoustic Characteristics To Deceive Playback Spoofing Countermeasures Of Speaker Verification Systems (2018)6.34
- Spoof Detection Using Time-delay Shallow Neural Network And Feature Switching (2019)8.35
- One-class Learning Towards Synthetic Voice Spoofing Detection (2020)17.31
- Representation Selective Self-distillation And Wav2vec 2.0 Feature Exploration For Spoof-aware Speaker Verification (2022)9.03
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Spoofing-robust Speaker Verification Using Parallel Embedding Fusion: BTU Speech Group's Approach For Asvspoof5 Challenge (2024)5.24
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Automatic Speaker Verification Spoofing And Deepfake Detection Using Wav2vec 2.0 And Data Augmentation (2022)17.35