Rawbmamba: End-to-end Bidirectional State Space Model For Audio Deepfake Detection
2024 Β· Yujie Chen, Jiangyan Yi, Jun Xue, et al.
Abstract
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets.
Authors
(none)
Tags
Stats
Related papers
- SSAMBA: Self-supervised Audio Representation Learning With Mamba State Space Model (2024)0.00
- Audio Mamba: Bidirectional State Space Model For Audio Representation Learning (2024)11.58
- ERF-BA-TFD+: A Multimodal Model For Audio-visual Deepfake Detection (2025)2.26
- Audio Mamba: Selective State Spaces For Self-supervised Audio Representations (2024)9.23
- MFAAN: Unveiling Audio Deepfakes With A Multi-feature Authenticity Network (2023)7.81
- What To Remember: Self-adaptive Continual Learning For Audio Deepfake Detection (2023)10.48
- Adaptive Re-calibration Of Channel-wise Features For Adversarial Audio Classification (2022)0.00
- Exploring Wavlm Back-ends For Speech Spoofing And Deepfake Detection (2024)4.52