Beyond Content Filtering: A “Circuit Breaker” Architecture for Autonomous Agent Action Safety

H. Sridharan·Reshma Nair·2026

Abstract

As Large Language Models (LLMs) evolve into “Agentic AI,” they are increasingly granted access to external tools and APIs to perform autonomous tasks. While significant research exists on filtering toxic textual output, there is a critical lack of safety mechanisms for functional execution. A hallucinating agent with write-access to a database or financial gateway poses a catastrophic risk that standard content moderation cannot mitigate. This paper introduces the “Action Circuit Breaker” (ACB), a deterministic middleware layer that sits between the LLM agent and external APIs. The ACB evaluates each structured action request against an explicit risk policy before execution. We provide (i) a minimal prototype design of an ACB deployed in front of REST endpoints and (ii) a simulation illustrating blocked vs. allowed actions, alongside a theoretical analysis of latency overhead $(<100 \text{ms}$ per hop in typical gateway settings) and risk reduction subject to policy correctness and coverage.

Abstract

Related papers