AgentHarm
Emerging1papers using it
2026first seen
AgentHarm is a benchmark used to evaluate agent-level safety in large language models by assessing their vulnerability to various jailbreak attacks.
AgentHarm is a benchmark used to evaluate agent-level safety in large language models by assessing their vulnerability to various jailbreak attacks.