Cybench
Emerging3papers using it
2024first seen
Cybench is a dataset that contains Python challenges used to evaluate the robustness and generalization of agentic large language models (LLMs) through semantics-preserving program transformations.
Cybench is a dataset that contains Python challenges used to evaluate the robustness and generalization of agentic large language models (LLMs) through semantics-preserving program transformations.