BeaverTails
Emerging1papers using it
2026first seen
The 'BeaverTails' dataset is a benchmark used to evaluate the effectiveness of defenses against adversarial attacks on open-weight large language models (LLMs).
The 'BeaverTails' dataset is a benchmark used to evaluate the effectiveness of defenses against adversarial attacks on open-weight large language models (LLMs).