SWE-bench Verified
Canonical40papers using it
2025first seen
Papers using SWE-bench Verified (40)
- HarnessX: A Composable, Adaptive, and Evolvable Agent Harness FoundrySocratic-SWE: Self-Evolving Coding Agents via Trace-Derived Agent SkillsHarnessBridge: Learnable Bidirectional Controller for LLM Agent HarnessDecentralized Multi-Agent Systems with Shared ContextSWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning ContextFrom Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness FlawsThe Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous AgentsOpen-SWE-Traces: Advancing Dual-Mode Multilingual Distillation for Software Engineering AgentsLong Live the Librarian! A Persistent Search Sub-Agent for Energy-Efficient Multi-Agent Software Engineering SystemsAgentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent HarnessesHybrid-gym: Training Coding Agents To Generalize Across TasksLean4Agent: Formal Modeling and Verification for Agent Workflow and TrajectoryFrontier Coding Agents Use Metaprogramming to Adapt to Unfamiliar Programming LanguagesProcCtrlBench: Evaluating Process-Level Defects and Control Preservation in LLM Coding AgentsAutomated Benchmark Auditing for AI Agents and Large Language ModelsCoMem: Context Management with A Decoupled Long-Context ModelSwe-bench-cl: Continual Learning For Coding AgentsRollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative RegimeSwe-prot\'eg\'e: Learning To Selectively Collaborate With An Expert Unlocks Small Language Models As Software Engineering AgentsEvaluating Plan Compliance In Autonomous Programming AgentsGroup-evolving Agents: Open-ended Self-improvement Via Experience SharingCRANE: Constrained Reasoning Injection for Code Agents via Nullspace EditingRatchet: A Minimal Hygiene Recipe for Self-Evolving LLM AgentsGuardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding AgentsSWE-Edit: Rethinking Code Editing for Efficient SWE-AgentCoherence Collapse: Diagnosing Why Code Agents Fail After Reaching the Right CodeAsk or Assume? Uncertainty-Aware Clarification-Seeking in Coding AgentsSWE-Universe: Scale Real-World Verifiable Environments to MillionsSWE-Master: Unleashing the Potential of Software Engineering Agents via Post-TrainingEvoMAS: Evolutionary Generation of Multi-Agent SystemsPull Requests as a Training Signal for Repo-Level Code EditingToward Training Superintelligent Software Agents through Self-Play SWE-RLSelf-Abstraction from Grounded Experience for Plan-Guided Policy RefinementR2e-gym: Procedural Environments And Hybrid Verifiers For Scaling Open-weights SWE AgentsPutting It All Into Context: Simplifying Agents With LclmsA Self-improving Coding AgentCo-patcher: Collaborative Software Patching With Component(s)-specific Small Reasoning ModelsSWE-EVO: Benchmarking Coding Agents In Long-horizon Software Evolution ScenariosGuided Search Strategies In Non-serializable Environments With Applications To Software Engineering AgentsEstablishing Best Practices For Building Rigorous Agentic Benchmarks