Abstract
Audio adversarial perturbations are designed to remain imperceptible to humans while deceiving automatic speech recognition (ASR) models. However, operating within the audible frequency range makes existing methods partially detectable in practice. In this article, we present LaserAdv, a laser-based adversarial attack that injects carefully crafted perturbations via laser signals, which are superimposed on speech rather than masking it. This design exploits a well-established property of adversarial examples—the ability to mislead models through minimal, often imperceptible, modifications—while preserving the underlying speech, thereby enabling higher attack efficiency and a longer effective attack range. To mitigate distortion introduced during laser transmission, we propose SAE-TFI, a selective amplitude enhancement method in the time–frequency domain. LaserAdv enables physically realizable attacks that are inaudible and closed box, while supporting targeted and universal attack settings without requiring signal synchronization. Experimental results demonstrate that a single perturbation can cause DeepSpeech, Whisper, and iFlytek to misinterpret any of the 12,260 voice commands as target with accuracy of up to 100%, 92% and 88%, respectively. The maximum effective attack distance reaches 120 m.