Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

Abstract

arXiv:2601.14340v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.

Abstract

Related papers