← all datasets

AILuminate

Emerging
1papers using it
2026first seen

The 'AILuminate' dataset/benchmark is used to evaluate the effectiveness of safety judges in identifying harmful outputs from large language models (LLMs) in user-model conversations.

AILuminate β€” datasets β€” generative-models