← all datasets

WebApp-1K

Emerging
6papers using it
2024first seen

WebApp1K is a benchmark containing 1000 diverse challenges across 20 application domains, used to evaluate large language models (LLMs) in test-driven development (TDD) tasks by assessing their ability to generate functional code from test cases.

Papers using WebApp-1K (6)

WebApp-1K β€” datasets β€” ai-for-code