SWE-bench Verified

Emerging

6papers using it

70,231HF downloads

94HF likes

2025first seen

Dataset Summary SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process. The dataset collects 500 test I

🤗 Hugging Face

Papers using SWE-bench Verified (6)

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness2026

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents2026

The Limits of Long-Context Reasoning in Automated Bug Fixing2026

Self-Abstraction from Grounded Experience for Plan-Guided Policy Refinement2025

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution2025

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs2025