Tianyu Pang
11 papers · 511 citations
Most-cited papers
- Weak-to-strong Jailbreaking On Large Language Models2024 · 109 citations
- Improved Techniques For Optimization-based Jailbreaking On Large Language Models2024 · 101 citations
- Improved Few-shot Jailbreaking Can Circumvent Aligned Language Models And Their Defenses2024 · 79 citations
- Bootstrapping Language Models With DPO Implicit Rewards2024 · 53 citations
Topics