← all datasets

TerminalBench

Emerging
3papers using it
2026first seen

TerminalBench is a dataset used to evaluate the performance of monitors in predicting failures in large language model (LLM) agent tasks based on terminal outcomes.

Papers using TerminalBench (3)

TerminalBench β€” datasets β€” llm-papers