SWE-bench Pro
Emerging12papers using it
74,071HF downloads
128HF likes
2025first seen
Dataset Summary SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks. Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os Datas
Papers using SWE-bench Pro (12)
- SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks?Laguna M.1/XS.2 Technical ReportSocratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent SkillsSWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue ResolutionCodeScout: An Effective Recipe for Reinforcement Learning of Code Search AgentsSWE-Replay: Efficient Test-Time Scaling for Software Engineering AgentsConfucius Code Agent: Scalable Agent Scaffolding for Real-World CodebasesToward Training Superintelligent Software Agents through Self-Play SWE-RLThe Dual-State Architecture for Reliable LLM AgentsSWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering
Tasks?Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial ScaleFastContext: Training Efficient Repository Explorer for Coding Agents