Terminal-Bench-2
Emerging4papers using it
2026first seen
Papers using Terminal-Bench-2 (4)
- Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent HarnessesShepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution TraceAutomated Benchmark Auditing for AI Agents and Large Language ModelsDissecting model behavior through agent trajectories