WebApp-1K

Emerging

5papers using it

2024first seen

WebApp-1K is a benchmark consisting of 1000 diverse challenges across 20 application domains, used to evaluate large language models (LLMs) in test-driven development (TDD) tasks by assessing their ability to generate functional code from test cases.

🔎 Find this dataset

Papers using WebApp-1K (5)

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation2025 · 1 cites

WebApp1K: A Practical Code-Generation Benchmark for Web App Development2024 · 1 cites

Insights from Benchmarking Frontier Language Models on Web App Code Generation2024

A Case Study of Web App Coding with OpenAI Reasoning Models2024