Jun Sun
12 papers · 192 citations
Most-cited papers
- Defending Large Language Models Against Jailbreak Attacks Via Layer-specific Editing2024 · 75 citations
- Ali-agent: Assessing Llms' Alignment With Human Values Via Agent-based Evaluation2024 · 49 citations
- Backdoorllm: A Comprehensive Benchmark For Backdoor Attacks And Defenses On Large Language Models2024 · 20 citations
- Adversarial Representation Engineering: A General Model Editing Framework For Large Language Models2024 · 16 citations
- Do Influence Functions Work On Large Language Models?2024 · 14 citations
Topics