AlignSentinel benchmark

Emerging

1papers using it

2026first seen

The AlignSentinel benchmark is a systematic dataset that contains inputs categorized into three classes—misaligned instructions, aligned instructions, and non-instruction inputs—and is used to evaluate the effectiveness of classifiers in detecting prompt injection attacks in language models.

🔎 Find this dataset

Papers using AlignSentinel benchmark (1)

AlignSentinel: Alignment-Aware Detection of Prompt Injection Attacks2026