GPT-OSS-Safeguard-20B

Emerging

1papers using it

2026first seen

The 'GPT-OSS-Safeguard-20B' is a benchmark dataset used to evaluate the effectiveness of adversarial attack algorithms on large language models, specifically in the context of jailbreaking and prompt injection.

🔎 Find this dataset

Papers using GPT-OSS-Safeguard-20B (1)

Claudini: Autoresearch Discovers State-of-the-art Adversarial Attack Algorithms For Llms2026