Ruoxi Jia
12 papers · 1560 citations
Most-cited papers
- Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!2023 · 1086 citations
- Sorry-bench: Systematically Evaluating Large Language Model Safety Refusal2024 · 168 citations
- Algorithm Of Thoughts: Enhancing Exploration Of Ideas In Large Language Models2023 · 108 citations
- Rigorllm: Resilient Guardrails For Large Language Models Against Undesired Content2024 · 76 citations
- Practical Membership Inference Attacks Against Large-scale Multi-modal Models: A Pilot Study2023 · 49 citations
Topics