← all datasets

MultiPL-E

Canonical

13papers using it

2023first seen

A multi-language translation of HumanEval/MBPP for evaluating code generation across 18+ programming languages.

🔎 Find this dataset

Papers using MultiPL-E (13)

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation2025 · 1 cites

Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment2025

Iterative Self-Training for Code Generation via Reinforced Re-Ranking2025

ReflectionCoder: Learning from Reflection Sequence for Enhanced One-off Code Generation2024

Code Llama: Open Foundation Models for Code2023 · 414 cites

SantaCoder: don't reach for the stars!2023 · 54 cites

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts2024 · 2 cites

A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks2024 · 2 cites

Instruction Fusion: Advancing Prompt Evolution through Hybridization2023

InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct2024

$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding2024

ExecRepoBench: Multi-level Executable Code Completion Evaluation2024

PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation2024

MultiPL-E dataset — papers, benchmarks & downloads · AI for Code