200-task primary dataset
Emerging1papers using it
2026first seen
The '200-task primary dataset' contains a diverse set of coding tasks used to evaluate the security reliability of large language models in code generation across different programming languages and prompting strategies.