HumanEval humaneval-4 Leaderboard
Auto-discovered from papers reporting HumanEval (Improvement). Β· Metric: Improvement (higher is better)
| # | Model | Improvement | Paper |
|---|---|---|---|
| 1 | Self-Correcting Code Generation Using Small Language Models | 27.70 | β |
| 2 | CodeGrad: Integrating Multi-Step Verification with Gradient-Based LLM Refinement | 27.00 | β |
| 3 | RGD: Multi-LLM Based Agent Debugger via Refinement and Generation Guidance | 9.80 | β |
| 4 | CELI: Controller-Embedded Language Model Interactions | 4.90 | β |