EvalPlus

Emerging

8papers using it

2024first seen

Evalplus is a benchmark dataset used to evaluate the performance of code generation models, specifically focusing on their ability to generate code while preserving privacy and security.

🔎 Find this dataset

Papers using EvalPlus (8)

LLM-Powered Test Case Generation for Detecting Bugs in Plausible Programs2024 · 3 cites

Unify and Triumph: Polyglot, Diverse, and Self-Consistent Generation of Unit Tests with LLMs2025 · 1 cites

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation2026

NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs2026

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement2024 · 4 cites

Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation2024 · 2 cites

Towards Large Language Model Aided Program Refinement2024 · 1 cites

Beyond Code Generation: Assessing Code LLM Maturity with Postconditions2024