Goodhart’s law says “When a measure becomes a target, it ceases to be a good measure.” That is, when people are rewarded on the basis of some metric, they’ll learn how to improve that metric, but not necessarily in a way that increases what you’re after. Here are three examples of Goodhart’s law related to software development.
- If you use lines of code to measure software developer productivity, Goodhart’s law predicts that you’ll get more lines of code, but not more productivity. In fact, you decrease productivity by discouraging code reuse.
- If you pay developers to fix bugs, you may get a sort of cobra effect: Bug bounties can incentivize breeding bugs the way cobra bounties incentivized breeding cobras.
- If you evaluate developers by number of comments, you’re likely to get verbose, unhelpful comments that make code harder to maintain.
Despite their flaws and the potential for perverse incentives, I claim it’s worth looking at metric outliers.
When I managed a software development team, I ran software that computed the complexity [1] of all the functions in our code base. One function was 100x more complex than anything else anyone had written. That drew my attention to a problem I was unaware of, or at least had underestimated.
If one function is 50% more complex than another, as measured by the software metric, that does not mean that the former is more complex in terms of the mental burden it places on human readers. But a difference of 10,000% is worth investigating. Maybe there’s a good explanation—maybe it’s machine-generated code, for example—but more likely it’s an indication that there’s a problem.
Code reviews are far better than code metrics. But code metrics could suggest code that should be a priority to review.
More software complexity posts
[1] McCabe complexity, a.k.a. cyclomatic complexity. Essentially the number of paths through a function.