← all datasets

Ο„^2-bench

Emerging
3papers using it
2026first seen

The 'Ο„^2-Bench' is a benchmark dataset used to evaluate the performance of online warning monitors for large language model agents by assessing their ability to predict risks based on traces of agent actions.

Papers using Ο„^2-bench (3)

Ο„^2-bench β€” datasets β€” llm-papers