Gmm-resnet2: Ensemble Of Group Resnet Networks For Synthetic Speech Detection
2024 Β· Zhenchun Lei, Hui Yan, Changhong Liu, et al.
Abstract
Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 a
Authors
(none)
Tags
Stats
Related papers
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Replay And Synthetic Speech Detection With Res2net Architecture (2020)16.32
- Gmm-resnext: Combining Generative And Discriminative Models For Speaker Verification (2024)4.52
- Synthetic Voice Detection And Audio Splicing Detection Using Se-res2net-conformer Architecture (2022)6.77
- Spoofing Speaker Verification Systems With Deep Multi-speaker Text-to-speech Synthesis (2019)0.00
- Spoof Detection Using Time-delay Shallow Neural Network And Feature Switching (2019)8.35
- Syn-att: Synthetic Speech Attribution Via Semi-supervised Unknown Multi-class Ensemble Of Cnns (2023)0.00
- Spatial Reconstructed Local Attention Res2net With F0 Subband For Fake Speech Detection (2023)8.82