โ† all datasets

Terminal-Bench 2.0

Emerging
12papers using it
2026first seen

Terminal-Bench 2.0 is a benchmark dataset used to evaluate the performance and evolution of self-evolving LLM-based agents across various tasks and metrics.

Papers using Terminal-Bench 2.0 (12)

Terminal-Bench 2.0 โ€” datasets โ€” ai-agents