Multi-perspective Information Fusion Res2net With Randomspecmix For Fake Speech Detection
2023 Β· Shunbo Dong, Jun Xue, Cunhang Fan, et al.
Abstract
In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the model's ability to locate discriminative information. Specmix cuts and pastes the frequency dimension information of the spectrogram in the same batch of samples without introducing other data, which helps the model to locate the really useful information. At the same time, we randomly select samples for augmentation to reduce the impact of data augmentation directly changing all the data. Once the purpose of helping the model to locate information is achieved, it is also important to reduce unnecessary information. The role of MPIF-Res2Net is to reduce redundant interference information. Deceptiv
Authors
(none)
Tags
Stats
Related papers
- Spatial Reconstructed Local Attention Res2net With F0 Subband For Fake Speech Detection (2023)8.82
- Mixture Of Experts Fusion For Fake Audio Detection Using Frozen Wav2vec 2.0 (2024)0.00
- Learning From Yourself: A Self-distillation Method For Fake Speech Detection (2023)10.85
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Heterogeneity Over Homogeneity: Investigating Multilingual Speech Pre-trained Models For Detecting Audio Deepfake (2024)8.09
- Gmm-resnet2: Ensemble Of Group Resnet Networks For Synthetic Speech Detection (2024)7.16
- Combining Automatic Speaker Verification And Prosody Analysis For Synthetic Speech Detection (2022)10.48
- Mixspeech: Data Augmentation For Low-resource Automatic Speech Recognition (2021)13.60