← all datasets

Cybench

Emerging
3papers using it
2024first seen

Cybench is a dataset that contains Python challenges used to evaluate the robustness and generalization of agentic large language models (LLMs) through semantics-preserving program transformations.

Papers using Cybench (3)