← all datasets

SWE-bench Pro

Emerging
12papers using it
74,071HF downloads
128HF likes
2025first seen

Dataset Summary SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks. Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os Datas

Papers using SWE-bench Pro (12)

SWE-bench Pro β€” datasets β€” ai-for-code