BigCodeBench
Canonical33papers using it
2024first seen
Papers using BigCodeBench (33)
- A Survey on Large Language Models for Code GenerationKodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
CodingFLARE: Fine-Grained Diagnostic Feedback for LLM Code RefinementConv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code TasksLarge Language Model Guided Self-Debugging Code GenerationEnhancing LLM-Based Code Generation with Complexity Metrics: A Feedback-Driven ApproachReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement LearningConsistency Meets Verification: Enhancing Test Generation Quality in Large Language Models Without Ground-Truth SolutionsNOIR: Privacy-Preserving Generation of Code with Open-Source LLMsDAJ: Data-Reweighted LLM Judge for Test-Time Scaling in Code GenerationFunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code GenerationInspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger CollaborationTALM: Dynamic Tree-Structured Multi-Agent Framework with Long-Term Memory for Scalable Code GenerationReinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code GenerationAlignment with Fill-In-the-Middle for Enhancing Code GenerationIterPref: Focal Preference Learning for Code Generation via Iterative
DebuggingCodeMixBench: Evaluating Large Language Models on Code Generation with
Code-Mixed PromptsOpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMsTeaching Your Models to Understand Code via Focal Preference AlignmentKodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for CodingMemorize or Generalize? Evaluating LLM Code Generation with Code RewritingVerbal Process Supervision Elicits Better Coding AgentsACECODER: Acing Coder RL via Automated Test-Case SynthesisUnitCoder: Scalable Iterative Code Synthesis with Unit Test GuidanceBigCodeBench: Benchmarking Code Generation with Diverse Function Calls
and Complex InstructionsTraining Language Models on Synthetic Edit Sequences Improves Code
SynthesisArctic-SnowCoder: Demystifying High-Quality Data in Code PretrainingDSTC: Direct Preference Learning with Only Self-Generated Tests and Code
to Improve Code LMsArctic-SnowCoder: Demystifying High-Quality Data in Code PretrainingACECODER: Acing Coder RL via Automated Test-Case SynthesisLarge Language Model Guided Self-Debugging Code GenerationVerbal Process Supervision Elicits Better Coding AgentsReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning