← all datasets

SWE-bench Verified

Canonical
1papers using it
69,597HF downloads
95HF likes
2026first seen

Dataset Summary SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process. The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The original… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified.

Papers using SWE-bench Verified (1)

SWE-bench Verified — datasets — ai-agents