BeaverTails
Emerging1papers using it
2026first seen
The 'BeaverTails' dataset is used to evaluate the internal mechanisms of large language models (LLMs) by analyzing adversarial responses and identifying layer-wise feature vulnerabilities in adversarial settings.