JailbreakBench
Emerging6papers using it
540HF downloads
5HF likes
2025first seen
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models Paper: JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Data: JailbreaBench-HFLink About Jailbreakbench is an open-source robustness benchmark for jailbreaking large language models (LLMs). The goal of this
Papers using JailbreakBench (6)
- Reflect-Guard: Enhancing LLM Safeguards against Adversarial Prompts via Logical Self-ReflectionRefusal Steering: Fine-grained Control over LLM Refusal Behaviour for Sensitive TopicsTempest: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree SearchWhen Prompt Optimization Becomes Jailbreaking: Adaptive Red-Teaming of Large Language ModelsStructured Visual Narratives Undermine Safety Alignment In Multimodal Large Language ModelsScaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments