Terminal-Bench
Emerging12papers using it
2025first seen
Papers using Terminal-Bench (12)
- Qwen3-Coder-Next Technical ReportCLI-Gym: Scalable CLI Task Generation via Agentic Environment InversionHardening Agent Benchmarks with Adversarial Hacker-Fixer LoopsADK Arena: Evaluating Agent Development Kits via LLM-as-a-DeveloperProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding AgentsCVE-Factory: Scaling Expert-Level Agentic Tasks for Code Security VulnerabilityA Self-Evolving Framework for Efficient Terminal Agents via Observational Context CompressionR2V Agent: Teaching SLMs When to Ask for HelpCRANE: Constrained Reasoning Injection for Code Agents via Nullspace EditingFrom Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to PythonAOrchestra: Automating Sub-Agent Creation for Agentic OrchestrationSkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent