โ† all datasets

AgentBench

Canonical
9papers using it
2023first seen

AgentBench is a benchmark designed to evaluate the performance and failure modes of agentic AI systems operating continuously in production environments.

Papers using AgentBench (9)

AgentBench โ€” datasets โ€” ai-agents