Replay And Synthetic Speech Detection With Res2net Architecture
2020 Β· Xu Li, Na Li, Chao Weng, et al.
Abstract
Existing approaches for replay and synthetic speech detection still lack generalizability to unseen spoofing attacks. This work proposes to leverage a novel model structure, so-called Res2Net, to improve the anti-spoofing countermeasure's generalizability. Res2Net mainly modifies the ResNet block to enable multiple feature scales. Specifically, it splits the feature maps within one block into multiple channel groups and designs a residual-like connection across different channel groups. Such connection increases the possible receptive fields, resulting in multiple feature scales. This multiple scaling mechanism significantly improves the countermeasure's generalizability to unseen spoofing attacks. It also decreases the model size compared to ResNet-based models. Experimental results show that the Res2Net model consistently outperforms ResNet34 and ResNet50 by a large margin in both physical access (PA) and logical access (LA) of the ASVspoof 2019 corpus. Moreover, integration with the
Authors
(none)
Tags
Stats
Related papers
- Synthetic Voice Detection And Audio Splicing Detection Using Se-res2net-conformer Architecture (2022)6.77
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- Gmm-resnet2: Ensemble Of Group Resnet Networks For Synthetic Speech Detection (2024)7.16
- A Study On Convolutional Neural Network Based End-to-end Replay Anti-spoofing (2018)0.00
- A Comparative Study On Recent Neural Spoofing Countermeasures For Synthetic Speech Detection (2021)0.00
- Replay Attack Detection With Complementary High-resolution Information Using End-to-end DNN For The Asvspoof 2019 Challenge (2019)11.39
- Improving Short Utterance Anti-spoofing With AASIST2 (2023)11.49
- Experimental Study: Enhancing Voice Spoofing Detection Models With Wav2vec 2.0 (2024)0.00