arXiv: 2604.22136 · PDF
Authors: Jun He, Deying Yu
Primary category: cs.CR · all: cs.CR, cs.LG
Matched keywords: large language model, llm, agent, agentic, reasoning, latency
TL;DR
SAL is a control-plane architecture that decouples LLM reasoning from execution: models emit structured intents with justifications, and a validator checks them against true state and policy before any mutation. A prototype blocks 100% of unsafe intents with 12.4 ms median overhead.

Key Ideas
- Direct coupling of stochastic LLM outputs to execution APIs is an unsound safety model.
- Separate intent emission (model) from intent validation + execution (control plane).
- Add an obfuscation membrane to hide identity-sensitive state from the model.
- Maintain a cryptographically linked Evidence Chain for audit and deterministic replay.
- Formal guarantees: policy-bounded execution, identity isolation, replay determinism.
Approach
Models produce structured intents (action, args, justification) rather than raw API calls. The control plane:
- Resolves intents against ground-truth system state (not model-hallucinated state).
- Runs policy checks + consistency checks against inventory/identity.
- Applies the obfuscation membrane so the model never sees sensitive identifiers.
- Appends each decision to an Evidence Chain (hash-linked) enabling replay.

Experiments
Prototype called OpenKedge for cloud infrastructure operations. Benchmark of safe vs. unsafe intents measured at two gates: policy layer and consistency layer. Metrics: unsafe-intent block rate, residual unsafe executions, added latency. No baseline architectures are named in the abstract.
Results
- 93% of unsafe intents blocked at policy layer.
- Remaining 7% rejected by consistency checks → 0 unsafe executions.
- 12.4 ms median added latency. Abstract does not report false-positive rate on safe intents, throughput, or scale.

Why It Matters
For agent and AI-infra practitioners, SAL offers a deployable pattern: treat the LLM as an untrusted intent source and move correctness/alignment guarantees into a deterministic, auditable control plane. This bounds blast radius of hallucinations and prompt injection without retraining the model.
Connections to Prior Work
- Capability-based security and reference monitors (classic OS security).
- Tool-use / function-calling agents (ReAct, Toolformer) — SAL adds a validation layer on top.
- Constitutional AI and guardrails (NeMo Guardrails, LlamaGuard) — complements content filtering with state-grounded policy checks.
- Verifiable logs / tamper-evident audit trails (certificate transparency, hash chains).
Open Questions
- How are policies authored and kept in sync with evolving system APIs?
- False-rejection rate on legitimate intents and recovery UX?
- Does the obfuscation membrane degrade task success when identity context is genuinely needed?
- Scalability of the Evidence Chain under high-throughput agent workloads.
- Robustness against adversarial justifications crafted to pass policy checks.
Original abstract
Large language model (LLM) agents increasingly issue API calls that mutate real systems, yet many current architectures pass stochastic model outputs directly to execution layers. We argue that this coupling creates a safety risk because model correctness, context awareness, and alignment cannot be assumed at execution time. We introduce Sovereign Agentic Loops (SAL), a control-plane architecture in which models emit structured intents with justifications, and the control plane validates those intents against true system state and policy before execution. SAL combines an obfuscation membrane, which limits model access to identity-sensitive state, with a cryptographically linked Evidence Chain for auditability and replay. We formalize SAL and show that, under the stated assumptions, it provides policy-bounded execution, identity isolation, and deterministic replay. In an OpenKedge prototype for cloud infrastructure, SAL blocks 93% of unsafe intents at the policy layer, rejects the remaining 7% via consistency checks, prevents unsafe executions in our benchmark, and adds 12.4 ms median latency.