← all datasets

HumanEval-XL

Emerging

8papers using it

2023first seen

A collection of cross-lingual benchmark for code generation.

🔎 Find this dataset

Papers using HumanEval-XL (8)

Strategies for Guiding LLMs to Use Software Design Patterns: A Case of Singleton2026

SwiftEval: Developing a Language-Specific Benchmark for LLM-generated Code Evaluation2025 · 1 cites

Enhancing LLM-Based Code Translation with Verified Multi-Semantic Representations2026

Programming Language Confusion: When Code LLMs Can't Keep their Languages Straight2025

Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMs2024 · 9 cites

HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization2024 · 4 cites

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model2023 · 2 cites

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation2024 · 1 cites

HumanEval-XL dataset — papers, benchmarks & downloads · AI for Code