← all datasets

BigCodeBench

Canonical

29papers using it

2024first seen

BigCodeBench The dataset has 2 variants: BigCodeBench-Complete: Code Completion based on the structured docstrings. BigCodeBench-Instruct: Code Generation based on the NL-oriented instructions. The overall statistics of the dataset are as follows: Complete Instruct # Task 1140 1140 # Avg. Test Cases 5.6 5.6 # Avg. Cove

🔎 Find this dataset

Papers using BigCodeBench (29)

A Survey on Large Language Models for Code Generation2024 · 59 cites

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding2025 · 6 cites

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding2025 · 5 cites

FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement2026

Conv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code Tasks2026

Enhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven Approach2025 · 2 cites

Large Language Model Guided Self-Debugging Code Generation2025 · 2 cites

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs2025 · 1 cites

Using Semantic Distance to Estimate Uncertainty in LLM-Based Code Generation2026

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning2026

Consistency Meets Verification: Enhancing Test Generation Quality in Large Language Models Without Ground-Truth Solutions2026

NOIR: Privacy-Preserving Generation of Code with Open-Source LLMs2026

DAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code Generation2026

FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation2026

InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration2025

TALM: Dynamic Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code Generation2025

Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation2025

Alignment with Fill-In-the-Middle for Enhancing Code Generation2025

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging2025

CodeMixBench: Evaluating Large Language Models on Code Generation with Code-Mixed Prompts2025

Teaching Your Models to Understand Code via Focal Preference Alignment2025

Memorize or Generalize? Evaluating LLM Code Generation with Code Rewriting2025

Verbal Process Supervision Elicits Better Coding Agents2025

ACECODER: Acing Coder RL via Automated Test-Case Synthesis2025

UnitCoder: Scalable Iterative Code Synthesis with Unit Test Guidance2025

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions2024 · 20 cites

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis2024 · 1 cites

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining2024

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs2024

BigCodeBench dataset — papers, benchmarks & downloads · AI for Code