Your SOC Metrics Are Vanity Until They Change a Decision

I've sat in a lot of security reviews where the dashboard does the talking. Mean time to detect, trending down. Mean time to respond, trending down. Alert volume, up and to the right because "coverage." Everyone nods. Nobody asks the only question that matters: what decision did any of this change?

That's my test for whether a metric is real or vanity. A real metric changes a decision. It gets a rule retired, a budget moved, an engineer reassigned, a control funded. A vanity metric gets a screenshot in a board deck. Most SOC programs I've seen are quietly building the second thing — a beautiful museum of dashboards that nobody actually operates from — while telling themselves they're building the first.

MTTD and MTTR are averages, and averages hide the truth

Start with the two numbers everyone leads with. MTTD and MTTR are means, and a mean is exactly the wrong shape for security data. Your alert population is not a normal distribution. It's a giant pile of low-severity noise that auto-closes in seconds and a thin tail of genuinely dangerous events that take hours or days. Average those together and you get a number that improves every time you add more cheap, fast, meaningless detections. You can drive MTTD down by alerting on more trivial things. That's not progress. That's gaming your own scoreboard.

If you're going to track time-based metrics, track them on the cases that could actually hurt you, and track the distribution — the median and the 95th percentile — not the mean. The tail is where the breach lives. A program that detects the obvious in thirty seconds and misses the lateral movement for three weeks has a gorgeous MTTD and a real problem. The average will never tell you that. The tail will.

Alert fatigue is a math problem you created

Here's the mechanic underneath all of it. Every alert you ship has a precision — the share of fires that are real — and every analyst has a finite attention budget. When you flood a queue with low-precision detections, you are not adding coverage. You are spending your team's attention on noise, which means the real signal arrives into a queue that's already exhausted. Alert fatigue isn't a morale issue you fix with pizza. It's the predictable output of shipping detections you never measured.

So measure them. Every detection rule should carry a precision number: of the last hundred times it fired, how many were true positives? If you can't answer that for a rule, you don't own that rule — it owns you. And the brutal, freeing truth is that deleting detections is often the highest-leverage thing a SOC can do. A rule that fires forty times a week at three percent precision is a tax on every real investigation. Killing it is a decision. It's measurable. It's the opposite of vanity.

This is also why "we tuned the SIEM" is one of the emptiest phrases in our field. Tuning to reduce volume isn't the goal — you can suppress your way to silence and call it a win right up until you miss something. Tuning should be in service of precision and coverage as a pair, and you should be able to state what you gained and what you gave up. If a tuning change can't be expressed as "this rule went from X precision to Y, and here's the coverage we verified we kept," it wasn't tuning. It was housekeeping.

Build a detection-engineering practice, not a dashboard museum

The reframe I keep pushing on my own teams: detection is software, and it deserves a software engineering practice. That sentence changes more than it sounds like.

It means detections live in version control, get code review, and ship through a pipeline — not clicked into a console by whoever had the on-call shift. It means every detection maps to a threat behavior you actually care about, ideally tied to MITRE ATT&CK so coverage is a map and not a vibe. It means you test detections before they go live and you test that they still work after, because data sources drift, a log format changes, and a rule that silently stopped firing six months ago is worse than no rule at all — it's a false sense of safety with a green light on it. And it means detections have owners and expiration dates, the same way code has maintainers and deprecation.

The work product of that practice is a backlog, not a wall of monitors. New threat intel comes in, it becomes a detection story with an acceptance test. A rule's precision decays, it becomes a tuning ticket. A gap shows up in your ATT&CK coverage, it becomes a build. Running fintech infrastructure for over 1,500 financial institutions taught me that the difference between a team that scales and one that drowns is whether the work is engineered or improvised. Detection is no different. Improvised detection doesn't scale past the cleverest person on the team, and that person eventually leaves.

The metrics that actually run this practice are unglamorous. Detection coverage against the techniques relevant to your business. Precision per rule, trending. The number of detections shipped, tuned, and retired per quarter — yes, retired is a feature. Time from new threat intel to deployed, tested detection. None of these make a pretty hockey-stick slide. All of them change what someone does next week.

So here's the challenge. Pull up your SOC dashboard and go metric by metric, and for each one ask: in the last quarter, what decision did this number actually change? Be honest. The ones that can't answer are decoration, and decoration is expensive — it costs you the attention you could have spent on the metrics that move detection outcomes. Keep the ones that change decisions. Kill the rest. That's not a reporting exercise. That's the start of detection engineering.

Detection EngineeringSecurity OperationsSIEMMetrics

Case Studies & Practice

Open Source