An End-to-end Multi-module Audio Deepfake Generation System For ADD Challenge 2023
2023 · Sheng Zhao, Qilong Yuan, Yibo Duan, et al.
Abstract
The task of synthetic speech generation is to generate language content from a given text, then simulating fake human voice.The key factors that determine the effect of synthetic speech generation mainly include speed of generation, accuracy of word segmentation, naturalness of synthesized speech, etc. This paper builds an end-to-end multi-module synthetic speech generation model, including speaker encoder, synthesizer based on Tacotron2, and vocoder based on WaveRNN. In addition, we perform a lot of comparative experiments on different datasets and various model structures. Finally, we won the first place in the ADD 2023 challenge Track 1.1 with the weighted deception success rate (WDSR) of 44.97%.
Authors
(none)
Tags
Stats
Related papers
- The Vicomtech Audio Deepfake Detection System Based On Wav2vec2 For The 2022 ADD Challenge (2022)14.06
- Transsionadd: A Multi-frame Reinforcement Based Sequence Tagging Model For Audio Deepfake Detection (2023)0.00
- AUDETER: A Large-scale Dataset For Deepfake Audio Detection In Open Worlds (2025)0.00
- Deepaudio-v1:towards Multi-modal Multi-stage End-to-end Video To Speech And Audio Generation (2025)0.00
- Exploring Wavlm Back-ends For Speech Spoofing And Deepfake Detection (2024)4.52
- Asasvicomtech: The Vicomtech-ugr Speech Deepfake Detection And SASV Systems For The Asvspoof5 Challenge (2024)5.24
- Diffuse Or Confuse: A Diffusion Deepfake Speech Dataset (2024)5.24
- Adaptive Re-calibration Of Channel-wise Features For Adversarial Audio Classification (2022)0.00