← all datasets

AGENTREDBENCH

Emerging
1papers using it
2026first seen

AGENTREDBENCH is a dynamic benchmark that evaluates the effectiveness of LLM agents against 215 subtle authorization attack scenarios across 24 enterprise integrations, focusing on indirect prompt injection threats in tool-use contexts.

Papers using AGENTREDBENCH (1)

AGENTREDBENCH β€” datasets β€” cybersecurity