A decade ago, the thing that kept security leaders up at night was the SaaS app nobody approved. Someone in marketing swiped a corporate card, signed up for a tool, wired it into the CRM, and moved on. We called it shadow IT, built a whole product category around discovering it, and mostly got it under control through a combination of SSO, expense visibility, and CASB tooling.
Shadow AI is the same disease with a worse prognosis. The pattern is identical — capable people routing around slow process to get work done — but the artifacts are far more dangerous, and they don't announce themselves with a recurring invoice. A prototype doesn't show up on a vendor spend report. A notebook doesn't have a billing contact. A side-channel agent someone stood up to triage support tickets has no owner in your CMDB. And every one of them tends to be holding a secret.
The real mechanic: prototypes are credential-bearing assets that no one owns
Here's what I want every CISO and head of platform to internalize. The risk isn't that your engineers are experimenting with AI. That's good — I want them experimenting. The risk is the half-life of what they build. A proof of concept gets stood up in an afternoon to answer a real question: can an LLM classify these documents, can an agent draft these responses, can this embedding pipeline find the duplicates. The question gets answered. The demo gets a thumbs-up in a meeting. And then everyone moves on to the next thing.
What stays behind is not nothing. It's a running or recoverable artifact that almost certainly contains a hardcoded API key, a database connection string pasted in to "just get it working," a service-account token with more scope than it needed, and — this is the part people underestimate — a sample of real production data that someone pulled in because synthetic data was too much friction. In regulated environments that sample is the whole problem. I work in a company that serves more than 1,500 financial institutions, and "we copied a slice of real consumer financial data into a notebook to test the model and forgot about it" is not a sentence that survives an audit, a regulator conversation, or a customer's security questionnaire.
So the mental model I push is this: stop thinking of prototypes as experiments and start thinking of them as unmanaged assets with embedded credentials and unknown data classification. Once you frame them that way, the program writes itself. You don't manage a prototype graveyard with a policy memo telling people not to make prototypes. You manage it the way you manage any asset class you don't fully control: discover it, classify it, and give it a lifecycle that ends.
Step one: discovery, because you can't sunset what you can't see
You already own most of the discovery surface — you just haven't pointed it at this problem. Secret scanning is the obvious starting point, and it's free leverage. If you're running secret detection only on your main application repos, you're scanning the tidy part of the house. Turn it on across every repo, every notebook directory, every personal sandbox, your internal package registries, and your wiki and ticketing systems, because that's where the "here's how to run my thing" snippets with live keys actually live. Notebooks deserve special attention: the .ipynb format quietly serializes cell outputs, which means a query result containing real records can be sitting in version control even when the code looks clean.
Then widen the aperture beyond code. Inventory the model and AI surface the same way you'd inventory servers. Look at API usage against the LLM providers and your gateway logs — every distinct key, project, and calling identity is a candidate agent or prototype. Look at your cloud accounts for the tell-tale shapes: an idle GPU instance, an inference endpoint with no traffic, a vector store nobody queries, a storage bucket full of embeddings. Look at the IAM side, because the most reliable signal of a forgotten prototype is a long-lived access key or service account that authenticated heavily for two weeks and then went silent. That silence is the graveyard.
Step two: classify, so the response is proportional
Discovery without classification just produces a giant anxious list. The point of classification is to separate the harmless from the radioactive so you spend your response budget correctly. For each artifact you find, you're answering three questions: Does it hold a credential, and is that credential still valid? What data does it touch or contain, and what's the highest classification in that set? And does it have a living owner who will claim it?
The data question is where I'd invest the most. Automated classification — pattern matching for things that look like account numbers, SSNs, emails, tokens, plus content inspection on whatever the prototype ingested — turns "we found 400 notebooks" into "11 of these touched regulated data and need a human this week." That triage is the entire value of the program. A weekend script that summarizes public docs and a notebook holding a year of real transactions are both "shadow AI," and treating them identically is how you either cry wolf or miss the fire.
Step three: a sunset workflow that actually ends things
This is where most well-intentioned cleanups die, because "demotion" gets treated as a one-time spring cleaning instead of a standing capability. Sunset has to be a real workflow with states and an owner, not an email asking people to please delete their old stuff. The states I'd run look like this:
- Claimed or orphaned. Notify the apparent owner with a deadline. No response means it's orphaned, and orphaned defaults to demotion — not indefinite preservation "just in case."
- Contain before delete. The first action on anything risky is to revoke the credential and cut the data access, immediately, before any debate about whether the project still has value. Killing the key is reversible-ish for the owner and decisively closes the exposure. Do that first, always.
- Promote or retire. If a prototype is genuinely valuable, it graduates into the managed world — real secrets management, scoped service identity, proper data handling, an owner on the hook. If it's not worth that effort, that is your answer: retire it, snapshot what's needed for the record, and destroy the rest.
The promotion path matters more than the deletion path, because it's what keeps this from being security playing whack-a-mole against the business. The deal you're offering builders is fair and worth saying out loud: experiment all you want, but the moment something becomes load-bearing, it comes into the light and gets adopted properly. You make the right path the easy path — a sanctioned sandbox with secrets injected at runtime, scoped credentials, synthetic data on tap, and a default expiration on anything that hasn't been promoted. Friction is what created the graveyard; remove it and you stop refilling the cemetery.
The challenge
Shadow AI didn't appear because your people are reckless. It appeared because building with AI got dramatically cheaper than governing it, and the gap between those two costs is exactly where your secrets are leaking out. You will not close that gap by banning the experimentation — you'll just push it somewhere you can't see, which is the only outcome worse than where you are now.
So here's the challenge I'd put to any security or platform leader reading this. Don't ask your team whether you have a shadow AI problem; you do. Ask them a harder question: if a prototype someone built six months ago is right now holding a valid production key and a copy of real customer data, how would we find it, and what happens next? If you can't answer both halves of that with a name and a workflow, that's the most important thing on your roadmap — and it's a weekend of secret scanning away from being started.
