SWE-bench Verified
Emerging6papers using it
70,231HF downloads
94HF likes
2025first seen
Dataset Summary SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process. The dataset collects 500 test I
Papers using SWE-bench Verified (6)
- HarnessBridge: Learnable Bidirectional Controller for LLM Agent HarnessAsk or Assume? Uncertainty-Aware Clarification-Seeking in Coding AgentsThe Limits of Long-Context Reasoning in Automated Bug FixingSelf-Abstraction from Grounded Experience for Plan-Guided Policy RefinementSWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open
Software EvolutionSkywork-SWE: Unveiling Data Scaling Laws for Software Engineering in
LLMs