AlignSentinel benchmark
Emerging1papers using it
2026first seen
The AlignSentinel benchmark is a systematic dataset that contains inputs categorized into three classes—misaligned instructions, aligned instructions, and non-instruction inputs—to evaluate the effectiveness of a classifier in detecting prompt injection attacks in language models.