โ† all datasets

SWE-bench

Canonical
43papers using it
2024first seen

'SWE-bench' is a dataset containing 64,380 runs from 126 software engineering agent configurations across 43 frameworks, used to evaluate the behavioral differences and performance outcomes of various LLM-based software engineering agents.

Papers using SWE-bench (43)

SWE-bench โ€” datasets โ€” ai-agents