AdvBench
Emerging2papers using it
12,417HF downloads
102HF likes
2025first seen
Dataset Card for AdvBench Paper: Universal and Transferable Adversarial Attacks on Aligned Language Models Data: AdvBench Dataset About AdvBench is a set of 500 harmful behaviors formulated as instructions. These behaviors range over the same themes as the harmful strings setting, but the adversary’s goal is instead to
🤗 Hugging Face⚖ mit