← all datasets

SWE-bench

Canonical
91papers using it
2023first seen

SWE-bench is a benchmark used to evaluate the performance of agent development kits (ADKs) by assessing the effectiveness of the agents they produce through a controlled methodology involving an LLM coding agent.

Papers using SWE-bench (91)

SWE-bench β€” datasets β€” ai-for-code