HumanEval humaneval-2 Leaderboard

#	Model	accuracy	Paper
1	Multi-Programming Language Ensemble for Code Generation in Large Language Model	96.25	—
2	Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach	90.20	—
3	CoCoNUT: Structural Code Understanding does not fall out of a tree	47.00	—
4	Guided Code Generation with LLMs: A Multi-Agent Framework for Complex Code Tasks	23.79	—
5	From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging	18.90	—
6	Fixing Function-Level Code Generation Errors for Foundation Large Language Models	7.50	—
7	FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system	6.10	—
8	Prompt Alchemy: Automatic Prompt Refinement for Enhancing Code Generation	5.00	—
9	Efficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data Selection	2.40	—

HumanEval humaneval-2 Leaderboard