CTF

Emerging

1papers using it

2026first seen

The 'CTF' dataset/benchmark consists of cybersecurity challenges that are used to evaluate the robustness and generalization of agentic large language models (LLMs) through semantically-equivalent transformations of the source code.

🔎 Find this dataset

Papers using CTF (1)

Capture The Flags: Family-based Evaluation Of Agentic Llms Via Semantics-preserving Transformations2026