← authors · overview

Jing Shao

21 papers · 743 citations

Most-cited papers

Salad-bench: A Hierarchical And Comprehensive Safety Benchmark For Large Language Models
2024 · 216 citations
Attacks, Defenses And Evaluations For LLM Conversation Safety: A Survey
2024 · 147 citations
Psysafe: A Comprehensive Framework For Psychological-based Attack, Defense, And Evaluation Of Multi-agent System Safety
2024 · 82 citations
Codeattack: Revealing Safety Generalization Challenges Of Large Language Models Via Code Completion
2024 · 68 citations
Explainable And Interpretable Multimodal Large Language Models: A Comprehensive Survey
2024 · 67 citations
Safework-r1: Coevolving Safety And Intelligence Under The Ai-45\(^{\circ}\) Law
2025
Toolsafe: Enhancing Tool Invocation Safety Of Llm-based Agents Via Proactive Step-level Guardrail And Feedback
2026
Benchmarks For Trajectory Safety Evaluation And Diagnosis In Openclaw And Codex: Atbench-claw And Atbench-codex
2026

Topics

Safety & Alignment Evaluation Safety Survey Paper Code Agents Fine-Tuning Reinforcement Learning Code Vision-Language Browser Agents