Safety
50 papers tagged Safety โ re-sort below
Papers
- AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security (2026)Dongrui Liu et al.14.78
- AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration (2026)Jiaqi Liu et al.13.52
- AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security (2026)Dongrui Liu et al.12.12
- It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs (2026)Sangwoo Park et al.11.64
- Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms (2026)Saidakhror Gulyamov et al.10.60
- NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation (2026)NVIDIA et al.10.48
- From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors (2026)Jiejun Tan et al.10.15
- RedAct: Redacting Agent Capability Traces for Procedural Skill Protection (2026)Shuwen Xu et al.9.48
- SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning (2025)Edward Y. Chang et al.9.36
- OmniVerifier-M1: Multimodal Meta-Verifier with Explicit Structured Recalibration (2026)Xinchen Zhang et al.9.28
- Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows (2026)Harshada Badave et al.8.80
- A Survey on Agent Workflow -- Status and Future (2025)Chaojia Yu et al.8.10
- BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search (2026)Shiyu Liu et al.7.94
- PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models (2026)Ziliang Zhao et al.7.79
- Learning to Act under Noise: Enhancing Agent Robustness via Noisy Environments (2026)Yuxin Chen et al.7.79
- Emergent Languages in Populations of Language Model Agents: From Token Efficiency to Oversight Evasion (2026)Stine Lyngs{\o} Beltoft et al.7.31
- Evaluating routing stability and coordination in swarm-based multi-agent task-oriented dialogue systems (2026)Abuzar Khan et al.7.24
- Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing (2025)Jinwei Hu et al.6.91
- Agent libOS: A Library-OS-Inspired Runtime for Long-Running, Capability-Controlled LLM Agents (2026)Yingqi Zhang6.75
- The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development? (2026)Xinyu Lu et al.6.75
- VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots (2025)Danil S. Grigorev et al.6.69
- Agentic Uncertainty Quantification (2026)Jiaxin Zhang et al.6.67
- Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents (2026)Xiang Liu et al.6.46
- Safety Aware Task Planning Via Large Language Models In Robotics (2025)Azal Ahmad Khan, Michael Andrev, Muhammad Ali Murtaza, et al.6.39
- Decentralized Safe Multi-agent Stochastic Optimal Control Using Deep Fbsdes And ADMM (2022)Marcus A. Pereira, Augustinos D. Saravanos, Oswin So, et al.6.34
- Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS (2025)Bingyu Yan et al.6.23
- Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults (2026)Rana Muhammad Usman5.88
- Generating Safe Autonomous Decision-making In ROS (2022)Yi Yang, Tom Holvoet5.84
- ProofAgent Harness: Open Infrastructure for Adversarial Evaluation of AI Agents (2026)Fouad Bousetouane5.82
- LACUNA: Safe Agents as Recursive Program Holes (2026)Yaoyu Zhao et al.5.82
- Open Challenges In Multi-agent Security: Towards Secure Systems Of Interacting AI Agents (2026)Christian Schroeder de Witt, Klaudia Krawiecka, Igor Krawczuk, et al.5.58
- From Confident Closing to Silent Failure: Characterizing False Success in LLM Agents (2026)Laksh Advani5.49
- Robustness without Wrinkles: Parallel Simulation and Robust MPC for Certified Deformable Manipulation (2026)Wei-Chen Li et al.5.49
- From Shield to Target: Denial-of-Service Attacks on LLM-Based Agent Guardrails (2026)Yuguang Zhou et al.5.49
- SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model (2026)Xiaoxin Lu et al.5.49
- Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents (2026)Yujun Zhou et al.5.46
- Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework (2026)Chingkwun Lam et al.5.41
- Distributionally Robust Free Energy Principle for Decision-Making (2025)Allahkaram Shafiei et al.5.29
- Existing LLMs Are Not Self-Consistent For Simple Tasks (2025)Zhenru Lin et al.5.04
- Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification (2026)Thanh Luong Tuan et al.5.01
- Provably Auditable and Safe LLM Agents from Human-Authored Ontologies (2026)Aaron Sterling5.01
- Domain-Conditioned Safety in Frontier Computer-Using Agents: A 793-Episode Browser Benchmark, a Coding-Domain Cross-Reference, and a Reproducibility Audit of Recent Red-Teaming (2026)Nicholas Saban5.01
- VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents (2026)Yunhao Yang et al.5.01
- CASS-RTL: Correctness-Aware Subspace Steering for RTL Generation with LLMs (2026)Mohammad Akyash et al.5.01
- Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems (2026)Yingzhuo Liu5.01
- Staying with the Uncertainty: Uncertainty-Scaffolding Strategies for Artificial Moral Advisors in LLM-to-LLM Simulated Conversations (2026)Salvatore Greco et al.5.01
- Reducing Hallucinations in Complex Question Answering using Simple Graph-based Retrieval-Augmented Generation (long version) (2026)Christopher J. Wedge et al.5.01
- Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies (2026)Christian Llanes et al.5.01
- Toward Secure LLM Agents: Threat Surfaces, Attacks, Defenses, and Evaluation (2026)Yuchen Ling et al.5.01
- SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems (2026)Tarun Sharma5.01