Weakly-supervised Audio Temporal Forgery Localization Via Progressive Audio-language Co-learning Network
2025 Β· Junyan Wu, Wenbo Xu, Wei Lu, et al.
Abstract
Audio temporal forgery localization (ATFL) aims to find the precise forgery regions of the partial spoof audio that is purposefully modified. Existing ATFL methods rely on training efficient networks using fine-grained annotations, which are obtained costly and challenging in real-world scenarios. To meet this challenge, in this paper, we propose a progressive audio-language co-learning network (LOCO) that adopts co-learning and self-supervision manners to prompt localization performance under weak supervision scenarios. Specifically, an audio-language co-learning module is first designed to capture forgery consensus features by aligning semantics from temporal and global perspectives. In this module, forgery-aware prompts are constructed by using utterance-level annotations together with learnable prompts, which can incorporate semantic priors into temporal content features dynamically. In addition, a forgery localization module is applied to produce forgery proposals based on fused f
Authors
(none)
Tags
Stats
Related papers
- Coarse-to-fine Proposal Refinement Framework For Audio Temporal Forgery Detection And Localization (2024)7.81
- Continual Learning For Fake Audio Detection (2021)11.49
- Securing Voice Biometrics: One-shot Learning Approach For Audio Deepfake Detection (2023)9.03
- Multi-modal Deepfake Detection And Localization With Fpn-transformer (2025)2.23
- Enhancing Partially Spoofed Audio Localization With Boundary-aware Attention Mechanism (2024)11.86
- Deep Residual Neural Networks For Audio Spoofing Detection (2019)0.00
- BYOL For Audio: Self-supervised Learning For General-purpose Audio Representation (2021)15.22
- Self-attention And Hybrid Features For Replay And Deep-fake Audio Detection (2024)0.00