← authors · overview

Michael Backes

14 papers · 1161 citations

Most-cited papers

"do Anything Now": Characterizing And Evaluating In-the-wild Jailbreak Prompts On Large Language Models
2023 · 581 citations
Composite Backdoor Attacks Against Large Language Models
2023 · 95 citations
Instruction Backdoor Attacks Against Customized Llms
2024 · 81 citations

Topics

Safety & Alignment Prompting Evaluation Training Techniques Fine-Tuning