← all datasets

AgentHarm

Emerging
1papers using it
2026first seen

AgentHarm is a benchmark used to evaluate agent-level safety in large language models by assessing their vulnerability to various jailbreak attacks.

Papers using AgentHarm (1)

AgentHarm β€” datasets β€” cybersecurity