ArtifactsBench
Emerging3papers using it
110HF downloads
13HF likes
2025first seen
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Tencent Hunyuan Team π Paper β’ π Home Page β’ π» Code β’ π Leaderboard β’ π Citation Figure 1: Automation level versus humanβalignment across evaluation frameworks. The red star marks the fully manual WebDev Arena (100% human effort)
π€ Hugging Faceβ cc-by-nc-4.0