SWE-bench
Emerging6papers using it
2024first seen
Papers using SWE-bench (6)
- The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context ManagementSWE-MERA: A Dynamic Benchmark for Agenticly Evaluating Large Language Models on Software Engineering TasksLearn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in
Realistic EnvironmentsSWE-bench Goes Live!ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test CasesSuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications