τ^2-bench

Emerging

3papers using it

2026first seen

The 'τ^2-Bench' is a benchmark dataset used to evaluate the performance of online warning monitors for large language model agents by assessing their ability to predict risks based on traces of agent actions.

🔎 Find this dataset

Papers using τ^2-bench (3)

PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors2026

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL2026

SkillX: Automatically Constructing Skill Knowledge Bases for Agents2026