#Modelpass@1Paper
1Planning-Driven Programming: A Large Language Model Programming Workflow98.20β€”
2SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution95.70β€”
3CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging95.10β€”
4CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models95.10β€”
5Poison with Style: A Practical Poisoning Attack on Code Large Language Models95.00β€”
6Multi-task Code LLMs: Data Mix or Model Merge?92.70β€”
7ARCS: Agentic Retrieval-Augmented Code Synthesis with Iterative Refinement87.20β€”
8BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation83.50β€”
9CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation77.80β€”
10Adaptive Confidence Gating in Multi-Agent Collaboration for Efficient and Optimized Code Generation70.12β€”
11CREME: Robustness Enhancement of Code LLMs via Layer-Aware Model Editing63.00β€”
12Modularization is Better: Effective Code Generation with Modular Prompting58.10β€”
13Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach35.71β€”
14Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality31.22β€”
15Context-Augmented Code Generation Using Programming Knowledge Graphs20.00β€”
16Enhancing Code Generation via Bidirectional Comment-Level Mutual Grounding17.10β€”
17RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing5.50β€”
18A Mixture of Linear Corrections Generates Secure Code2.10β€”
HumanEval humaneval Leaderboard