Counting Everything (Except What Counts)
Our addiction to metrics has led to a unwarranted myth of confidence
Striving to better, oft we mar what's well.
[King Lear Act 1, Scene 4]The World of Scores
We live in a world where only what is measurable has meaning. This is deeply and spiritually disturbing.
I meet two friends every Sunday at 7 a.m. to surf, a ritual that dates back almost two decades. Our break is one of the most consistently beautiful places on the planet. The lagoon near the break is on the migratory path of pelicans, cormorants, grey whales, loons, terns, and more, and is the nesting ground for countless shorebirds. It is as close as possible to the earth’s fertile, churning anima and a place for science and wonder.
After a recent surf session, the conversation focused on my friend’s Oura smart ring. He raved about the Oura sleep score, an index from 1 to 100 that distills his sleep quality and integrity, including separate measures of efficiency, restfulness, and more. What a wonder! Having optimized our efficiency in our waking hours, we can now focus on our embarrassingly inefficient … sleep.
The Oura score was an abstraction more vital than reality even in perfection’s ground zero. And this score is only one of hundreds available to us, the indices of an ideal life.
The Myth of Confidence
The profusion of scores and life metrics, and our passionate attachment to them, constitute the myth of confidence, our belief that measurability equals certainty, and that certainty is our birthright as citizens of science and technology.
Certainty in life is not reasonable, yet we not only pursue but also demand it. Why?
One might consider this the Hubermanizing of our culture. His podcasts are a liturgy of terminology, clearly and convincingly articulated and coupled with the recitation of contemporary research. He incantates the new Holy Trinity of terminology, journal research, and Stanford. He exudes confidence.
The domains of math, physics, chemistry, and biochemistry have created an expectation of certitude and precision. These fields, at least in their more pedestrian realms, have the pleasing characteristic of solving with consistency and accuracy and a dogged resistance to interpretation and contingency.
Even in the messiness of thermodynamics and quantum theory, we can still productively predict aggregate behavior even if we can’t isolate every interaction. We can be confident.
Unfortunately, the problems we care about in our daily lives, those concerning society, nutrition, health, and the economy, belie the quiet certitude of what they call the “hard” sciences. It may be more accurate to presume that the certainty inherent in math, chemistry, and physics is weirder and more anomalous than the contingency of the softer sciences. The hard sciences are easier to accept, but only hold in local contexts.
The myth of confidence rests on the presumption that quantifiability equals knowledge. The world and its systems may be complex, but if we find the right things to measure and collect enough data, we can render them predictable and exploitable.
Learning from Walmart
And why should science not inspire confidence? Our great research institutions sent men to the moon, created Artificial Intelligence and the Internet, and promise to cure cancer in our lifetimes. If their standard of truth is numerical, why not adopt it in our personal affairs?
Numerical methods can be misleading because the real world is tricky (especially when you include people). Let me share an experience that shaped my thinking.
In the early 2000s, I ran a company that automated A/B testing for e-commerce companies. Our software used fractional factorial methods to test combinations of factors like copy, color, and images on web pages and evaluate their impact on the likelihood of purchase. For example, we could show a picture of a happy family on the experience for a subset of the test population, and a product image to another. We would track each population and determine which experience had a marginally better conversion rate1.
Walmart hired us to conduct a test on their home page, which was one of the most visited websites in the world. The scale of this test population made it likely the most extensive randomized test executed to that time.
The test launched early Monday morning, and within four hours, it had reached a p-value < 0.05, considered statistically significant at the 95% confidence level. This means there is less than a 5% probability that the observed results are due to random chance under the null hypothesis. It was not perfectly predictive of the future, but it was strong enough to make a justifiable claim that our test's "winning" branch would predictably drive more purchases.
95% is not absolutely predictive, but it is adequate for published research.
We continued executing the test. Later in the afternoon, the winning branch changed, and the confidence level, both calculated and experienced, plummeted. High confidence is not an absolute expectation of predictability, but it was unsettling to have an experiment with 100s of thousands of participants behave almost chaotically.
After the test, we worked with Princeton University specialists in experimental design and statistics and hedge fund “quants” to determine whether the math or methods were suspect and to see whether there were different approaches we could employ. They came up empty on all counts. Our test failed because our goal was to find an answer, a neat, reliable heuristic. We wanted to affix reality to a pinboard like a butterfly: static, placid, and, unfortunately, dead.
Occam’s Broom
The real world is unsympathetic to reduction. We had more data than ever assembled in a marketing test and were still radically uncertain. Our ruthless culling of variables was intended to provide insight; instead, we were likely victims of “Occam’s Broom.”
Sydney Brenner coined Occam’s broom, the counterpoint to Occam’s Razor2, to describe how, during analysis, inconvenient facts are "swept under the carpet" to present a tidy but misleading narrative. Our compulsion for simplification would be pathological if it were not so essential to our culture.
When we ran the Walmart test, we presumed that a massive, randomized trial population would eliminate the influence of factors like time of day, source of traffic, weather, regionality, or who knows what. Randomization was worse than a broom; it was a giant vacuum to remove inconvenient perturbations in our clean test.
This tendency to oversimplify complex systems doesn't just affect large-scale experiments like our Walmart test. It shapes how we approach personal optimization in our daily lives.
Hills and Hubris
Our failure at Walmart wasn't unique. Our faith in precise measurement and predictable influence in that experiment mirrors our narrow perspectives when self-optimizing. We want to sell more, look younger, live longer, or make more money, the “more-ing” of what we already desire. To do so, we look for obvious, available strategies: eat less, work out more, or run more often, and we pursue these simple correlations.
This leads to two types of errors: local maxima and the “static world” fallacy3.
In the first, we attempt to improve a value by evaluating whether different approaches get us closer or farther to our goal, and pursuing those that are favorable, an approach called hill climbing. We find a peak, but because we headed in the wrong direction, it is just a foothill, not the summit we sought4.
In the second, we presume a static world and ignore that our effort in pursuing a metric changes the system. For example, if we consume fewer calories to lower our weight, our body responds by lowering our metabolism, negating our efforts.
Perhaps the more straightforward way of describing both is that they represent a pedestrian form of hubris, a lack of respect for the world's complexity, and its persistent tendency to subvert our intentions. As H.L. Mencken quipped, “Every complex problem has an answer that is clear, simple, and wrong.”
Behind the Numbers
We strive for the certainty of numbers because we cannot or will not find certainty in life. We yearn for a spreadsheet that will make us feel safe.
One might consider this the Hubermanizing of our culture. His podcasts are a liturgy of terminology, clearly and convincingly articulated and coupled with the recitation of contemporary research. He incantates the new Holy Trinity of terminology, journal research, and Stanford. He exudes confidence.
We believe in the authority of institutions, technology, and numerical precision. It is hypnotic and incontrovertible, and we want it to be true.
We benefit when we can deliberately use measurement as a tool for seeing and discerning. Statistical analysis can find hidden value in a sea of observations that may be obscured to the naked eye. A metric can be an invitation to inquiry or an opportunity to see something hard to see, like changes in atmospheric conditions. It can inform decisions: What should I wear based on the temperature?
When we vacillate, numbers can encourage, showing progress when life’s trials are particularly onerous.
The problem is not that measurement is worthless. It’s that numbers give false confidence. We believe that numbers know, or more precisely, that we reject anything unmeasurable as “not knowledge.” The pernicious truth is that at any level of meaningful complexity, the predictive validity of numbers falls precipitously.
But often metrics obscure intuition. We index ourselves, aligning to a “real feel” index yet ignoring how it really feels. We get a shorthand for the quality of experience that loses the essential quality of the experience.
We do not want any numbers eliminated, as they are an investment. Without our numbers, we risk losing meaning. What does being a researcher, physician, or financier mean if we cannot prove our contribution without a scorecard?
Can we learn how to be without always having to be effective?
The Way Forward
This myth of confidence is the closest thing we have to a shared societal system in the current era. The postmodern era removed the idea of ethical absolutes, essential and unchanging rules, and we have scuttled about replacing them with empirical certainty. We have traded The Truth according to God for the facts according to MIT.
We are drunk on studies and meta-studies that show with precision how we think, what we should eat, exactly how long we should meditate, and even which people shouldn’t look inward lest they drown. We know precisely which Serotonin receptors are associated with addiction and that our screens are Dopamine hijackers.
And it may all be nonsense. If we added all the longevity improvements we should expect from behavior change, we would double our lifespan, which is an obviously dubious proposition. This is the myth of confidence.
This is not to say we cannot take value or solace from metrics.
Numbers can help us overcome paralysis and point us in the right direction. We should measure, watch, and count because life can be onerous. We should find ways to mark our progress, even simple ones like counting birthdays or years of sobriety.
However, we should not confuse measurement with experience. Any models we use should be a mode of inquiry or how we test the dictum of conformity. We must practice intuition, if only to subvert bias. Otherwise, we are just hill-climbing on shifting sands, and our “more-ing” results in our unmooring.
Metrics are not meaning.
Conversion rate is a standard metric for e-commerce and is calculated as the % of visitors that subsequently purchase. For example, if there are 100 visitors, and 2 eventually purchase, that is a 2% conversion rate.
Occam’s Razor is the principle that the simplest explanation for a phenomenon will likely be the most accurate.
The more serious names for this are the Ceteris Paribus fallacy or Goodhart’s law: When a measure becomes a target, it ceases to be a good measure.
It describes how optimizing for a metric changes the system in ways that make the metric less meaningful.
This is referred to as a local maxima.

Great piece. First, thanks for intro’ing me to Occam’s Broom. Provides a useful schema for pointing out analytical flaws. Second, reading this, I had the thought that quantification is a new religion for many people. At the same time, there’s a growing trend to reject research, studies, and data as untrustworthy because they’re either manipulated (by Big Pharma etc) or because they don’t account for spiritual knowledge/intuition (I’m seeing this more in the motherhood discourse). That divergence is interesting to me. Is simply every last thing becoming increasingly polarized these days? There is no happy medium to be found anywhere, it feels like (though even as I say this it feels like an overreading).