← all datasets

AdvBench

Emerging
2papers using it
12,417HF downloads
102HF likes
2025first seen

Dataset Card for AdvBench Paper: Universal and Transferable Adversarial Attacks on Aligned Language Models Data: AdvBench Dataset About AdvBench is a set of 500 harmful behaviors formulated as instructions. These behaviors range over the same themes as the harmful strings setting, but the adversary’s goal is instead to

Papers using AdvBench (2)

AdvBench — datasets — federated-learning