WebApp-1K
Emerging6papers using it
2024first seen
WebApp1K is a benchmark containing 1000 diverse challenges across 20 application domains, used to evaluate large language models (LLMs) in test-driven development (TDD) tasks by assessing their ability to generate functional code from test cases.
Papers using WebApp-1K (6)
- Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code
GenerationTests as Prompt: A Test-Driven-Development Benchmark for LLM Code GenerationWebApp1K: A Practical Code-Generation Benchmark for Web App DevelopmentInsights from Benchmarking Frontier Language Models on Web App Code
GenerationInsights from Benchmarking Frontier Language Models on Web App Code
GenerationA Case Study of Web App Coding with OpenAI Reasoning Models