Jun Sun
12 papers · 192 citations
Most-cited papers
- Defending Large Language Models Against Jailbreak Attacks Via Layer-specific Editing2024 · 75 citations
- Ali-agent: Assessing Llms' Alignment With Human Values Via Agent-based Evaluation2024 · 49 citations
Topics