Benchmarks for Stateful Defenses (BSD)
Emerging1papers using it
2025first seen
The 'Benchmarks for Stateful Defenses (BSD)' is a data generation pipeline that automates the evaluation of covert attacks and corresponding defenses, providing datasets that are challenging for models to refuse and facilitating the assessment of stateful defenses against misuse.