MBPP
Canonical21papers using it
180,817HF downloads
230HF likes
2024first seen
Dataset Card for Mostly Basic Python Problems (mbpp) Dataset Summary The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task descrip
π€ Hugging Faceβ cc-by-4.0
Papers using MBPP (18)
- Heteroskedastic Signals in Budgeted LLM Verification: Structural Heterogeneity Limits Optimization GainsACE: Self-Evolving LLM Coding Framework via Adversarial Unit Test Generation and Preference OptimizationFast-dLLM++: Fr\'{e}chet Profile Decoding for Faster Diffusion LLM InferenceEpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMsThink Anywhere in Code GenerationLocally Confident, Globally Stuck: The Quality-Exploration Dilemma in Diffusion Language ModelsGuided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience RetrievalTGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMsFrom Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMsEfficient Code LLM Training via Distribution-Consistent and Diversity-Aware Data SelectionEnhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template SearchLearning to Insert [PAUSE] Tokens for Better ReasoningLearning to Generate Unit Tests for Automated DebuggingCodeCriticBench: A Holistic Code Critique Benchmark for Large Language
ModelsLLaDA 1.5: Variance-Reduced Preference Optimization for Large Language
Diffusion ModelsdParallel: Learnable Parallel Decoding for dLLMsShape of Thought: When Distribution Matters More than Correctness in Reasoning TasksOpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs