HumanEval humaneval-3 Leaderboard

#	Model	Success rate	Paper
1	From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence	95.00	—
2	From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence	95.00	—
3	Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback	75.00	—
4	Large Language Model Guided Self-Debugging Code Generation	5.70	—

HumanEval humaneval-3 Leaderboard