A Shadow-Agent Discovery Playbook for Regulated FIs

Every financial institution I talk to has a shadow IT problem. They just don't call it that anymore, because the shape of it changed. It used to be a marketing analyst with a personal Dropbox. Now it's an "agent" — a script, a workflow in a SaaS tool, a browser extension, a Zapier-style automation, a copilot with API keys — that someone wired up to be helpful and that is now reading data, calling endpoints, and making decisions on its own schedule. Nobody filed a ticket. Nobody ran it through vendor review. And it has credentials.

I want to be precise about why this is different from the laptop-and-Dropbox era. A traditional shadow tool is passive: it sits there holding a copy of something. A shadow agent is active. It authenticates, it loops, it takes actions, and it often has the standing permission to do those actions again tomorrow without a human in the loop. In a regulated FI — where you and your downstream institutions are accountable for member data under GLBA, where examiners care about who can touch what — an unsanctioned thing that holds credentials and acts autonomously is not a productivity hack. It's an unmanaged third party that you've installed inside your own perimeter. That's the reframe the whole playbook hangs on.

The real mechanic: agents are third parties, so treat discovery as third-party risk

Here's the move that makes this tractable. Stop trying to invent a brand-new "AI governance" discipline from scratch, and instead route shadow agents into two control frameworks you already operate against: FFIEC third-party / vendor risk management and your SOC 2 change-control process. You don't need a new religion. You need to recognize that an agent with API access is functionally a vendor with a service account, and a new agent going live is functionally a change to your production environment. The frameworks already tell you what to do with both. The job is to make agents visible to those frameworks before they touch data.

So the playbook is three plain verbs — find, classify, gate — and each one maps to a control you can point an examiner or auditor at.

Find: discovery is an identity and egress problem, not an AI problem

The mistake people make is hunting for "AI." You can't grep your environment for intent. What you can do is hunt for the fingerprints an agent leaves, because an agent has to authenticate and it has to talk to a model somewhere. Those are both observable.

I'd run discovery on four surfaces in parallel. First, identity: enumerate every non-human identity you have — service accounts, OAuth grants, personal access tokens, API keys, third-party app authorizations in your IdP. In any cloud environment this is your richest signal, because an agent that does anything useful is assuming a role or holding a key. Pull the OAuth consent grants out of your identity provider and read them like a guest list; the long tail of third-party app authorizations nobody remembers approving is where the shadow agents live. Second, egress: agents call LLM inference endpoints. Watch DNS and your egress logs for traffic to the major model providers' API domains. Outbound calls to inference APIs from a system that has no business reasoning about anything is a tell. Third, SaaS: the biggest agent factory right now is the automation and "AI feature" toggle inside tools you already pay for. Inventory which sanctioned SaaS apps have agent or connector features enabled and what data scopes they were granted. Fourth, code and infra: scan repos and pipelines for SDK imports, agent frameworks, and API keys checked into places they shouldn't be.

None of this requires exotic tooling. CASB, your IdP's app reporting, CSPM, cloud flow logs, secrets scanning — you almost certainly own these already. The reframe is simply pointing them at the question "what here is acting on its own with credentials?" and treating a positive hit as a discovered vendor relationship, which is exactly what your FFIEC program says you must inventory.

Classify: data sensitivity and autonomy are the two axes that matter

Once you have a list, resist the urge to score everything to death. For a regulated FI, two questions decide an agent's risk tier, and you can answer both quickly.

What data can it reach? Public/internal, or member PII / nonpublic personal information? The instant an agent can touch member data, it crosses into GLBA territory and the bar goes way up.
How autonomous is it? Does it suggest and wait for a human, or does it take actions — write, send, transact, change config — without one? Read-only-with-human-in-the-loop is a different animal from write-and-act-unattended.

That two-by-two does most of the work. A read-only summarizer over public docs is a Tuesday. An agent with write access to anything member-facing, running unattended, is a stop-the-line event. The reason this maps cleanly to FFIEC is that your vendor program already classifies third parties by criticality and data access — you're just applying the same lens to an internal automaton. Document the classification the same way you'd document a vendor's data-handling tier, because when an examiner asks how you govern AI, "here is our agent inventory tiered by data sensitivity and autonomy, using the same criteria as our third-party program" is a genuinely strong answer.

Gate: nothing reaches member data without passing change control

Discovery and classification are worthless if there's no gate. The gate is where SOC 2 change control earns its keep. The principle is simple and it should be non-negotiable: an agent does not get production credentials to member data until it has gone through the same review, approval, and logging that any other production change requires. That means a defined approval step with a named owner, a security review proportional to its tier, least-privilege scoping of whatever identity it runs as, and an audit trail of who approved it and why.

Practically, you enforce this at the identity layer, because that's the chokepoint you actually control. Member-data scopes and write permissions get issued through your normal access-provisioning workflow — the one that already produces approval records your auditor samples. An agent that can't get a credential can't act. So the gate isn't a policy PDF; it's the fact that the path to sensitive data runs through a process that leaves evidence. Pair that with short-lived credentials over standing keys wherever you can, and monitoring on the non-human identities so that a gated agent which suddenly changes behavior throws an alert. Detection has to keep running after approval, because an agent's blast radius is defined by its permissions plus its autonomy, and both can drift.

One more thing that's easy to skip: write down the sanctioned path. A lot of shadow agents exist because the legitimate route to "I want to automate this with AI" was undefined or took six weeks. If you make discovery a hunt-and-punish exercise without offering a fast, paved road to do it right, you train people to hide their agents better. The goal is to make the sanctioned path the path of least resistance.

The takeaway

You don't need a novel framework to govern shadow agents, and you don't have time to wait for one. You need to recognize that an autonomous thing holding credentials is a third party plus a production change, and you already know how to govern both of those. Find them through identity and egress signals. Classify them by data sensitivity and autonomy. Gate them through the change control that already produces your audit evidence.

So here's the challenge I'd put to any security leader at an FI: pull your IdP's list of OAuth and third-party app grants and your egress logs to model-provider endpoints, and see how many active agents you can find that never crossed your vendor-review desk. If you find even one with a path to member data, you don't have a future AI-governance project. You have an undocumented third party with credentials in production right now — and the clock on managing it started the moment it authenticated, not the moment you noticed.

AI SecurityAI GovernanceFintechRisk Management

Case Studies & Practice

Open Source