Advwave: Stealthy Adversarial Jailbreak Attack Against Large Audio-language Models
2024 Β· Mintong Kang, Chejian Xu, Bo Li
Abstract
Recent advancements in large audio-language models (LALMs) have enabled speech-based user interactions, significantly enhancing user experience and accelerating the deployment of LALMs in real-world applications. However, ensuring the safety of LALMs is crucial to prevent risky outputs that may raise societal concerns or violate AI regulations. Despite the importance of this issue, research on jailbreaking LALMs remains limited due to their recent emergence and the additional technical challenges they present compared to attacks on DNN-based audio models. Specifically, the audio encoders in LALMs, which involve discretization operations, often lead to gradient shattering, hindering the effectiveness of attacks relying on gradient-based optimizations. The behavioral variability of LALMs further complicates the identification of effective (adversarial) optimization targets. Moreover, enforcing stealthiness constraints on adversarial audio waveforms introduces a reduced, non-convex feasib
Authors
(none)
Tags
Stats
Related papers
- Who Can Withstand Chat-audio Attacks? An Evaluation Benchmark For Large Audio-language Models (2024)2.26
- ALIF: Low-cost Adversarial Audio Attacks On Black-box Speech Platforms Using Linguistic Features (2024)7.16
- Targeted Adversarial Examples For Black Box Audio Systems (2018)15.75
- Inaudible Adversarial Perturbations For Targeted Attack In Speaker Recognition (2020)12.33
- Adversarial Attack And Defense Strategies For Deep Speaker Recognition Systems (2020)13.39
- Adversarial Attacks Against Automatic Speech Recognition Systems Via Psychoacoustic Hiding (2018)16.45
- SA: Sliding Attack For Synthetic Speech Detection With Resistance To Clipping And Self-splicing (2022)0.00
- Impact Of Phonetics On Speaker Identity In Adversarial Voice Attack (2025)0.00