Field Notes

My takes on AI

Short, opinionated field notes on AI governance, agents, AI security, and FinOps — written for the CIOs, CISOs, and architects who have to ship.

Jun 25, 2026·Michael York·tools

Pick a model like you size a cluster, not like you pick a sports team

Frontier model selection is a routing and capacity-planning problem, not brand loyalty with an API key attached. Route each task to the minimum effective intelligence — the cheapest model that still produces an accepted result — and instrument every call with a task tag, model id, effort level, and cost. Reserve the expensive reasoning passes for genuinely hard, high-stakes work, pin to stable ids instead of floating 'chat-latest' aliases, and keep a fallback wired in so a suspension or rate-limit event degrades gracefully.

My take

Tier your tasks, route to minimum effective intelligence, pin your versions, and score calibration as seriously as accuracy. Boring is what survives a model getting suspended three days after launch.

Jun 24, 2026·Michael York·security

The agent is the easy part — the control plane is the job

Standing up an agent that calls a few tools takes an afternoon; wiring it into a real environment safely is the actual engineering. The control plane is layered and boring: scoped least-privilege tools instead of god-mode credentials, a separate judge at the action boundary (the agent proposes, the judge disposes, the tool executes), run-level observability instead of chat logs, graduated autonomy where reversibility — not confidence — is the gate, and a kill switch that revokes the token rather than closing the tab.

My take

An agent should never reach unsupervised autonomy for an irreversible action. Match authority to the cost of being wrong, and pull your kill switch on purpose to prove it works.

Jun 23, 2026·Michael York·security

Stop trying to patch prompt injection

Prompt injection is not a defect a vendor will eventually close. It is a property of how LLMs read context, the same way SQL injection was a property of mixing code and data — the model has no reliable way to tell your trusted instructions apart from text that arrives in a document, a web page, or a tool result. So stop hardening inputs against this week's jailbreak and design systems that stay safe when the model is fully manipulated: least-privilege tool scopes, egress controls with real DLP, the dual-LLM quarantine pattern, and an SBOM for your AI frameworks.

My take

Assume injection succeeds and engineer the blast radius down to nothing. The governance question is not whether you trust the model — it is what each identity is permitted to do on its worst day.

Jun 22, 2026·Michael York·security

Your agents already outnumber your people — nobody is governing their credentials

Non-human identities — service accounts, CI runners, API keys, and now AI agents — already outnumber humans by a wide margin, and almost no one governs them like a workforce. Agents can authenticate, but they cannot prove they are authorized, and every SOC 2 and HIPAA control was written for humans who can. The fix is mechanical: an agent identity registry, just-in-time scoped credential issuance, token vaulting, revocation measured in minutes not days, and an audit trail tied to the principal.

My take

Start with the registry — you cannot govern what you have not counted. Do this and most of the scary AI-agent incident reports become ordinary, contained access-control events.

Jun 21, 2026·Michael York·tools

Your AI bill is the new cloud bill, and nobody is watching the meter

We spent a decade learning cloud FinOps and threw all of it out for LLM spend, where the unit of cost is the request and autonomous agents generate the requests — a looping agent can burn a quarter's budget in eleven days. Meter at an internal gateway, not the invoice: stamp every call with team, feature, agent, model, token counts, and a derived cost. Route to minimum effective intelligence, enforce hard caps inline so a runaway loop downgrades or refuses instead of forwarding, tag agents like cost centers, and run a boring weekly review on cost per accepted result.

My take

A budget you only read about after the fact is a postmortem, not a control. The meter is already running — the only question is whether you have built the dashboard yet.

Jun 20, 2026·Michael York·tools

Design your AI inference like the model could vanish tomorrow

A frontier model launched and was pulled offline three days later under an export-control directive — single-provider inference is now a continuity risk on the same tier as a single-AZ database. Put a gateway in front that speaks one stable contract, route to a primary and an independent fallback owned by a different company, pin to immutable model ids with a scheduled re-validation cadence, and repatriate the data that legally cannot leave to open-weight models inside a controlled VPC.

My take

Design for the version that disappears and the upgrades take care of themselves. Failover is just routing with a different trigger — the same table that picks the cheapest passing model is the one you flip when a provider goes dark.

Jun 19, 2026·Michael York·policy

The boundary layer is the actual AI control

Every governance framework I've read — NIST AI RMF, ISO 42001, the EU AI Act, the newer CCPA ADM rules — describes roughly the same control set: inventory, risk classification, documentation, testing, monitoring, human oversight. They're all correct. None of them are the actual control. The actual control is one design decision, made once, per system: does this output get acted on, or does it get interpreted first?

My take

Draw the boundary first. Everything else follows.

Jun 18, 2026·Michael York·policy

The 2026 AI regulatory map that fits on one page

Everyone read 'EU AI Act deferred to 2027' and exhaled, but the part that can fine you 3% of global turnover — GPAI enforcement — turns on August 2, 2026. Four rules with real teeth (EU GPAI, the US Treasury's 230 financial-services control objectives, Texas TRAIGA's NIST safe harbor, and NIST's Cyber AI Profile) all describe the same underlying functions in different dialects. Adopt NIST AI RMF — Govern, Map, Measure, Manage — as your internal source of truth and you answer most questionnaires by reindexing, not redoing.

My take

Build an evidence pipeline, not a binder: a self-updating inventory, request-level attribution, automated control checks mapped to NIST functions, and tamper-evident retention. The four regulations stop being four programs and become four views of one dataset.

Jun 17, 2026·Michael York·policy

Bake the audit evidence into your AI pipeline before the examiner asks

Audit-defensibility is not a document you write after the fact — it is a non-functional requirement you engineer in from the first commit, like a latency budget. Map each control objective to a concrete artifact (a log line, a row, an approval record), capture provenance and decision logs as first-class data with pinned model ids and hashed inputs, and put a trust layer between generation and anything anyone relies on: deterministic checks that block on a failed reconciliation, plus an independent hostile-reviewer pass. Gate AI-touched data migrations with canaries, a rejected-record quarantine, row-count reconciliation, and human sign-off.

My take

Never let the model self-certify production data — the moment the generator is also the validator, your evidence is worthless. Build systems whose normal operation emits audit evidence as exhaust.

May 11, 2026·Michael York·security

AI raises the floor for attackers far more than the ceiling

What genuinely changes is cost and scale on the easy stuff — fluent personalized phishing, faster reconnaissance, quick first drafts of malicious code. What doesn't change, at least not yet, is the hard part: breaking into a well-designed, well-monitored environment still takes the same fundamentals. The least-skilled attacker got meaningfully more capable; the determined, skilled adversary against a serious defender did not move much.

My take

If your defense against social engineering was 'we trained people to spot bad grammar,' that strategy just expired. Harden the human layer and keep your fundamentals boring and excellent.

Apr 9, 2026·Michael York·research

Most agent deployments automate dysfunction at machine speed

The security problem with agents is organizational, not cryptographic. If the process underneath is broken, an agent just runs the broken process faster. The unglamorous controls — scoped authority, the label that decides whether an output gets acted on or interpreted — are what decide whether the glamorous stuff ages well.

Feb 17, 2026·Michael York·research

Automate the boring, not the judgment

Automate the work that is repetitive, well-defined, and low-variance — gathering, summarizing, correlating, the recurring 'what changed overnight' sweep. Keep for humans the work that is rare, ambiguous, or carries real downside if it's wrong. The payoff isn't a smaller team; it's a team whose scarce attention lands where it's irreplaceable.

RSS·JSON feed

Case Studies & Practice

Open Source

My takes on AI

Pick a model like you size a cluster, not like you pick a sports team

The agent is the easy part — the control plane is the job

Stop trying to patch prompt injection

Your agents already outnumber your people — nobody is governing their credentials

Your AI bill is the new cloud bill, and nobody is watching the meter

Design your AI inference like the model could vanish tomorrow

The boundary layer is the actual AI control

The 2026 AI regulatory map that fits on one page

Bake the audit evidence into your AI pipeline before the examiner asks

AI raises the floor for attackers far more than the ceiling

Most agent deployments automate dysfunction at machine speed

Automate the boring, not the judgment