← all datasets

AlignSentinel benchmark

Emerging
1papers using it
2026first seen

The AlignSentinel benchmark is a systematic dataset that contains inputs categorized into three classes—misaligned instructions, aligned instructions, and non-instruction inputs—to evaluate the effectiveness of a classifier in detecting prompt injection attacks in language models.

Papers using AlignSentinel benchmark (1)

AlignSentinel benchmark — datasets — cybersecurity