← all datasets

CRUXEval

Canonical
10papers using it
3,193HF downloads
21HF likes
2024first seen

CRUXEval: Code Reasoning, Understanding, and Execution Evaluation 🏠 Home Page β€’ πŸ’» GitHub Repository β€’ πŸ† Leaderboard β€’ πŸ”Ž Sample Explorer CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation) is a benchmark of 800 Python functions and input-output pairs. The benchmark consists of two tasks, CRUXEval-I (i

Papers using CRUXEval (10)

CRUXEval β€” datasets β€” ai-for-code