← all datasets

BaxBench

Emerging
2papers using it
57HF downloads
5HF likes
2025first seen

Dataset Summary BaxBench is a coding benchmark constructed to measure the ability of code generation models and agents to generate correct and secure code. It consists of 392 backend development tasks, which are constructed by combining 28 scenarios that describe the backend functionalities to implement and 14 backend

Papers using BaxBench (2)

BaxBench β€” datasets β€” ai-for-code