APPS apps Leaderboard
Auto-discovered from papers reporting APPS (pass@1). Β· Metric: pass@1 (higher is better)
| # | Model | pass@1 | Paper |
|---|---|---|---|
| 1 | Planning-Driven Programming: A Large Language Model Programming Workflow | 62.60 | β |
| 2 | SolidCoder: Bridging the Mental-Reality Gap in LLM Code Generation through Concrete Execution | 26.70 | β |
| 3 | CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | 22.00 | β |