AdvBench

Emerging

2papers using it

2025first seen

AdvBench is a benchmark used to evaluate the safety and trustworthiness of large language models by assessing their responses to potentially harmful content.

🔎 Find this dataset

Papers using AdvBench (2)

Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization2025 · 2 cites

Responsible Federated LLMs via Safety Filtering and Constitutional AI2025