← all datasets

ArtifactsBench

Emerging
3papers using it
110HF downloads
13HF likes
2025first seen

ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Tencent Hunyuan Team πŸ“– Paper β€’ 🏠 Home Page β€’ πŸ’» Code β€’ πŸ† Leaderboard β€’ πŸ“œ Citation Figure 1: Automation level versus human–alignment across evaluation frameworks. The red star marks the fully manual WebDev Arena (100% human effort)

Papers using ArtifactsBench (3)

ArtifactsBench β€” datasets β€” ai-for-code