Dark Code Is a Control Failure, Not Tech Debt

There's a comfortable lie we tell ourselves about the code our AI tools generate. We call it tech debt. We say it the way you'd say a kitchen needs repainting — a someday problem, a backlog item, a thing we'll get to after the next release. It's a soothing word because debt is something you choose to take on and pay down on your own schedule.

I want to retire that framing. The growing pile of AI-generated code that no human on your team actually understands is not debt. It's a control failure. And the difference matters, because you fix debt with refactoring sprints and you fix control failures by not letting them happen in the first place.

The question every change has to survive

Strip away the tooling and every engineering organization runs on one question: who decides this is correct? Not who wrote it. Who stands behind it. For decades that question had an obvious answer, because the person who typed the code was the person who understood it, and the reviewer was a second human who could reconstruct the author's reasoning well enough to approve it. Correctness had an owner with a pulse.

AI-generated code that nobody comprehends fails that question quietly. The diff looks plausible. The tests are green. The PR has an approver's name on it. But if you walked up to that approver and asked them to explain why the retry logic backs off the way it does, or what happens to that auth token on the error path, they couldn't tell you. They didn't decide it was correct. They decided it looked correct, which is a different and much weaker claim. I call this dark code: code that runs in production while the question of its correctness has no real owner. The model produced it, the human waved it through, and the accountability evaporated somewhere in between.

In a fintech serving more than 1,500 financial institutions, that evaporation isn't abstract. Every line that touches a consumer's financial data is something a regulator, an auditor, or a partner's due-diligence team may eventually ask us to explain. "The AI wrote it and it passed CI" is not an answer I can give in that room. It's not an answer any of us should be comfortable giving in any room.

Why "tech debt" is the wrong diagnosis

Tech debt is code you understand and wish were better. Dark code is code you ship and don't understand at all. Those are not the same species of problem, and treating them the same way is how you end up surprised.

The reason this matters operationally: debt degrades gracefully and dark code fails catastrophically. Messy-but-understood code slows you down — features take longer, onboarding is harder, the estimate is always optimistic. Painful, but linear. Dark code doesn't slow you down at all until the day it does something nobody predicted, in a path nobody read, under conditions nobody tested, and then you're doing incident archaeology on a function no living person ever reasoned about. You can't grep your way out of a comprehension gap. The cost isn't paid in velocity; it's paid in blast radius.

And here's the part that should bother every security and platform leader: AI has made producing dark code nearly free, while the work of understanding it stayed exactly as expensive as it always was. We have automated the supply and left the demand for comprehension flat. That gap is where the risk lives, and it widens with every sprint we don't address it.

The real mechanic: make comprehension a gate, not a virtue

Here's the reframe I keep coming back to. We already know how to stop unaccountable changes from reaching production — we've been doing it for non-code artifacts for years. You can't merge an infrastructure change without a plan and an approver. You can't push to a protected branch without review. You can't deploy without the pipeline passing. We encode our controls as gates, not as aspirations, because we learned long ago that "everyone should be careful" is not a control. It's a wish.

So stop treating comprehension as a virtue you hope your engineers practice, and start treating it as a gate that code has to clear. Concretely, three things have to be true before AI-generated code earns a path to production:

Provenance. You should be able to answer, for any change, how it was produced — human-authored, AI-assisted, or substantially AI-generated. This isn't about shaming the model; it's about routing. A substantially generated change gets a different level of scrutiny than a three-line human fix, the same way a change to an auth service gets more eyes than a change to a marketing page. You can't apply proportional control to a signal you don't capture.
Ownership. Every merge needs a human who is willing to say, on the record, "I understand this and I stand behind it." Not "I skimmed it." Not "the tests passed." A named person whose understanding is the thing being attested. If no one will make that claim, the change does not merge. The model is a collaborator; it is never the accountable party.
Eval gates. Tests prove behavior on the cases you thought of. For generated code you also need evaluation that probes the cases the author didn't think of — property-based checks, adversarial inputs, behavioral diffs against the prior implementation, and for security-relevant paths, explicit checks that the generated logic does what the spec says rather than merely what makes the tests green. Green CI is necessary; it is nowhere near sufficient.

None of this slows down the engineer who actually used the tool well — read the output, understood it, can defend it. It only catches the case we should want caught: the change nobody can explain trying to clear the pipeline on momentum alone. That's the whole point of a gate. It's invisible to the people doing it right and immovable to the people cutting the corner.

The challenge

So here's where I'll leave it. Go look at what merged into your main branch in the last month and pick a change at random. Find the person whose name is on the approval and ask them to walk you through why it's correct — not what it does, why it's right. If they can, your controls are working. If they can't, you don't have tech debt. You have code running in production that no human on earth decided was correct, and you found out by accident instead of by design.

AI is going to keep getting better at writing code we can't easily understand. That's not the threat. The threat is letting "I don't understand it" become an acceptable state for code that clears CI. Make comprehension a gate. Unexplained code doesn't ship — not because the model is untrustworthy, but because correctness needs an owner, and that has always been the job.

AI GovernanceDevOpsSoftware Supply ChainFintech

Case Studies & Practice

Open Source