← all datasets

DebugBench

Emerging
3papers using it
825HF downloads
31HF likes
2024first seen

Dataset Summary DebugBench is a Large Language Model (LLM) debugging benchmark introduced in the paper DebugBench: Evaluating Debugging Capability of Large Language Models. We collect code snippets from the LeetCode community and implant bugs into source data with GPT-4. The project is also open-sourced as a GitHub rep

Papers using DebugBench (3)