A demo agent takes an afternoon. The plumbing that lets it touch production safely is the actual engineering work, and almost nobody shows it.
An AI coding agent recently deleted a startup's production database and then deleted its volume-level backups. The whole thing took about nine seconds. No malice, no exotic exploit. The agent had write access to a real system, decided a destructive action was the right next step, and nothing sat between the decision and the disk.
That story gets passed around as a horror anecdote. I read it as a design review. The agent worked exactly as built. What failed was everything around it, and that everything is the part most people skip.
Here is the contrarian claim I will defend: building the agent is the easy 20%. Wiring an agent into a real environment is mostly a control-plane problem, and the control plane is layered, boring, and where the actual engineering lives. If you can stand up a chatbot that calls a few tools in an afternoon, congratulations, you have finished the part that does not matter yet.
Let me walk through the layers I will not deploy an agent without. None of this requires a research budget. You can start adding these on Monday.
Scoped tools, not god-mode credentials
The first mistake is handing an agent a broad credential and a broad tool. "Run SQL" is not a tool. It is a loaded weapon with a natural-language trigger. A tool is get_customer_by_id(id) that returns three fields, runs as a role that can only read, and has a row cap.
This matters more now because the threat model is no longer hypothetical. In June, OWASP mapped prompt injection to 6 of the 10 categories in its agentic-AI Top 10. The LiteLLM PyPI supply-chain backdoor poisoned CrewAI, DSPy, and GraphRAG earlier this year. "ClawHavoc" planted malicious skills on a public marketplace. Tool poisoning is real: the description of a tool, the data it returns, and the instructions embedded in that data are all attacker-controllable surfaces. SearchLeak (CVE-2026-42824), the one-click Microsoft 365 Copilot exfiltration flaw disclosed in June, is the second exfiltration class of its kind after EchoLeak. The pattern repeats because the model treats retrieved content as instructions.
So scope the tool, not just the prompt. Every tool gets least-privilege credentials of its own, a narrow input schema, output limits, and an allowlist of what it can reference. Assume any text the agent reads might be trying to redirect it, because increasingly it is.
A judge at the action boundary
A model deciding to act and a model being allowed to act should be two different things. Put a validator at the boundary where intent becomes effect.
This is not the same as asking the model "are you sure?" The judge is a separate check, ideally a different model or a deterministic rule set, that evaluates the proposed action against policy before it executes. Does this write touch a production table? Does this email leave the tenant? Does the cumulative blast radius of this run exceed a threshold? The judge answers, and only then does the tool fire.
A useful mental model: the agent proposes, the judge disposes, the tool executes. Three roles, never collapsed into one. The same logic that protects you also saves you money. I call it minimum effective intelligence: route each task to the cheapest model that still yields an accepted result. Your judge can often be a smaller, faster model or plain code, because validating a structured action against a policy is a narrower problem than generating it. Attribute cost per request, and cap budgets so a runaway loop bankrupts a line item instead of your quarter.
Run-level observability, not session-level logs
Most teams log conversations. That is the wrong unit. When something goes wrong you do not want a transcript, you want the causal chain: which tool was called, with which arguments, derived from which retrieved document, under which credential, costing how much, and what the judge said before it passed.
Bedrock added request-level usage attribution last month. Microsoft Foundry shipped project-level cost attribution and brought Managed VNet to GA in the same window. These are not finance features. Per-run, per-request attribution is how you reconstruct an incident and how you catch the slow-motion failures, the agent that quietly retries a destructive call forty times, or the one whose costs spike because a poisoned document sent it into a loop. The FinOps Foundation named AI cost management the top wanted skill for 2026, with roughly 98% of organizations now managing AI spend. The teams who can attribute a dollar to a run are the same teams who can attribute a mistake to a run.
Log the run as a first-class object. Trace ID, tool calls, data lineage, judge verdicts, cost. If you cannot answer "why did the agent do that" from your telemetry, you do not have observability, you have a chat history.
Graduated autonomy
Autonomy is not a switch, it is a ladder, and you climb it one rung per capability after the lower rung has earned trust.
- Read. The agent can observe and report. No writes, anywhere.
- Draft. It produces the change, the email, the query, but a human ships it.
- Staged write. It writes to a sandbox, a branch, a draft record. Real output, no production effect.
- Approved write. It writes to production behind explicit human approval per action or per batch.
- Autonomous, reversible only. It acts on its own, but only for actions you can cleanly undo.
That last rung is the rule that nine-second story violated. Deleting a database is not reversible. Deleting the backups is the opposite of reversible. An agent should never reach unsupervised autonomy for an irreversible action, full stop. Reversibility is the gate, not confidence, not accuracy, not how good the demo looked.
Most production agents I would actually trust live at "draft" or "approved write" for anything that matters, and "autonomous" only for narrow, idempotent, reversible operations. That is not timidity. That is matching authority to the cost of being wrong.
A kill switch that actually kills
Every agent needs an off switch that a human can hit without a deploy, and it has to cut the thing that does damage: the credentials and the tool access, not just the chat UI.
This is now an identity problem more than a UI problem. The Cloud Security Alliance's May whitepaper put non-human identities at roughly 45 to 1 against humans, as high as 144 to 1 in some estimates. Every agent, every tool, every sub-agent is an identity with permissions. Auth0 shipped an agent-native identity stack with "Agent as Principal" and a token vault. Cloudflare shipped scannable API tokens with auto-revocation and resource-scoped roles. The European Identity Conference converged on OAuth 2.1 plus OpenID AuthZEN. The industry is building the primitives because the old model, a long-lived key in an environment variable, does not survive contact with an autonomous caller. GitGuardian found more than 1.27 million AI secrets on public GitHub last year, up 81%.
Your kill switch should revoke the token, not close the tab. Test it like you test backups, by actually pulling it and confirming the agent goes inert.
Where this is heading
The protocol layer is converging fast. MCP, A2A, and AG-UI are settling into a stack for how agents discover tools, talk to each other, and talk to users. That is good, it means less custom glue. It also means the tool boundary, the agent-to-agent boundary, and the agent-to-human boundary are each becoming standard, inspectable, and therefore something you can put a control plane around. The standardization is not a reason to relax. It is the reason the layers above become both possible and mandatory.
The model is a commodity that improves on its own schedule, faster than anyone's roadmap. Opus 4.8 shipped about 41 days after 4.7. Flagships now also disappear on a regulator's timeline, as we saw in June when two were pulled offline within days under an export-control directive. You do not control the model. You control the plane it acts through.
So build that. Scope the tools. Put a judge at the boundary. Log runs, not chats. Climb the autonomy ladder one rung at a time and never give irreversible actions away. Wire a kill switch to the credential, and pull it on purpose to prove it works.
The demo is the easy part. The control plane is the job.
What is the first layer you would add, or the one you have watched fail in production? I would rather argue about this in the comments than read another clean demo.
