← all datasets

Terminal-Bench 2.0

Emerging
3papers using it
2026first seen

Terminal-Bench~2.0 is a benchmark dataset used to evaluate the performance of large language model agents in long-horizon tasks by assessing their interaction with various harnesses.

Papers using Terminal-Bench 2.0 (3)

Terminal-Bench 2.0 β€” datasets β€” llm-papers