The problem with engineering metrics
Most engineering teams are not under-measured. They have dashboards, velocity charts, deployment frequency graphs, and quarterly OKRs. The problem is not data — it is signal.
A signal metric is one that is causally connected to the outcome you care about. When the metric improves, the underlying thing actually gets better. When the metric degrades, something real is going wrong.
A noise metric — what I call an activity metric — measures that work is happening. Story points completed. Lines of code committed. Tickets closed. These metrics can look healthy while the actual engineering quality deteriorates.
The distinction matters because when you optimize for noise metrics, you can improve the numbers without improving the system. Teams learn quickly which behaviors make the dashboard look good. They start doing those things.
A classification framework
Outcome metrics are hard to game because they measure real-world results: system reliability (actual user impact, not uptime percentage), customer value delivered (did the feature change behavior?), time to resolve incidents (the real clock, not the on-call clock).
Indicator metrics are correlated with outcomes and useful as leading indicators — but they become noise when treated as targets. DORA metrics (deployment frequency, lead time, change failure rate, MTTR) fall here. They signal engineering health when read honestly. They become theater when engineering teams optimize directly for them.
Activity metrics should rarely be management metrics. Velocity, story points, and PR count measure output, not outcomes. They are useful for team self-organization and capacity planning. They are harmful as performance metrics because they incentivize the wrong behaviors.
The four questions for any metric
Before adding a metric to your engineering dashboard, apply this test:
1. What decision does this metric inform? If you cannot name a specific decision, the metric produces noise. Every metric should map to a decision someone makes differently when the number changes.
2. What behavior does tracking this metric incentivize? Assume engineers are rational and will optimize for the metric as written. Is the resulting behavior what you want? If the metric is “PRs merged per week,” the incentivized behavior is small, frequent PRs — which is sometimes good and sometimes means important work is being avoided.
3. Is this metric lagging or leading? Lagging metrics tell you what happened. Leading metrics tell you what is likely to happen. Both are useful, but for different purposes. Lagging metrics explain the past. Leading metrics allow intervention before the outcome is fixed.
4. Is this metric independent or correlated? Showing four metrics that all move together as if they are independent signals is misleading. If your deployment frequency, lead time, and story point velocity all improve at once, you may be measuring the same thing three ways.
What a signal-heavy metric system looks like
The instinct to add more metrics comes from a reasonable place: more data feels like more visibility. But more data without signal density increases cognitive load without improving decisions. The goal of an engineering metric system is not completeness — it is the minimum set of metrics that lets you make better decisions faster.
The measurement trap
The highest-risk moment for any metric system is when things go well. During a good quarter, teams often add more metrics to track the goodness. Dashboards grow. By the time a real problem appears, there is so much data that the signal is buried in noise.
Build the system for the hard quarter, not the easy one.