The Sabermetric Bias

My first column for Grantland is now online. It’s about how sabermetrics, while extremely useful in many situations, can also lead us to conclude that it’s more useful than it actually is. This is largely because people tend to overweight variables that can be easily quantified. Comments are not yet enabled on Grantland, so feel […]

My first column for Grantland is now online. It's about how sabermetrics, while extremely useful in many situations, can also lead us to conclude that it's more useful than it actually is. This is largely because people tend to overweight variables that can be easily quantified. Comments are not yet enabled on Grantland, so feel free to hate on me (but in a constructive fashion!) in the comment section below:

Buying a car is a hard decision. There are just so many variables to think about. We've got to inspect the interior and analyze the engine, and research the reliability of the brand. And then, once we've amassed all these facts, we've got to compare different models.

How do we sift through this excess of information? When consumers are debating car alternatives, studies show that they tend to focus on variables they can quantify, such as horsepower and fuel economy. (Psychologists refer to this as the "anchoring effect," since we anchor our decision to a number.) We do this for predictable reasons. The amount of horsepower directly reflects the output of the engine, and the engine seems like something that should matter. (Nobody wants an underpowered car.) We also don't want to spend all our money at the gas station, which is why we get obsessed with very slight differences in miles per gallon ratings.

Furthermore, these numerical attributes are easy to compare across cars: All we have to do is glance at the digits and see which model performs the best. And so a difficult choice becomes a simple math problem.

Unfortunately, this obsession with horsepower and fuel economy turns out to be a big mistake. The explanation is simple: The variables don't matter nearly as much as we think.^^ Just look at horsepower: When a team of economists analyzed the features that are closely related to lifetime car satisfaction, the power of the engine was near the bottom of the list. (Fuel economy was only slightly higher.) That's because the typical driver rarely requires 300 horses or a turbocharged V-8. Although we like to imagine ourselves as Steve McQueen, accelerating into the curves, we actually spend most of our driving time stuck in traffic, idling at an intersection on the way to the supermarket. This is why, according to surveys of car owners, the factors that are most important turn out to be things like the soundness of the car frame, the comfort of the front seats and the aesthetics of the dashboard. These variables are harder to quantify, of course. But that doesn't mean they don't matter.

But this is not a column about cars. My worry is that sports teams are starting to suffer from a version of the horsepower mistake. Like a confused car shopper, they are seeking out the safety of math, trying to make extremely complicated personnel decisions by fixating on statistics. Instead of accepting the inherent mystery of athletic talent — or at least taking those intangibles into account — they are pretending that the numbers explain everything. And so we end up with teams that are like the worst kind of car. They look good on paper — so much horsepower! — but they fail to satisfy. The dashboard is ugly, the frame squeaks, and the front seats make our ass hurt.

This is largely the fault of sabermetrics. Although the tool was designed to deal with the independent interactions of pitchers and batters, it's now being widely applied to team sports, such as football and basketball. The goal of these new equations is to parse the complexity of people playing together, finding ways to measure quarterbacks while disregarding the quality of their offensive line, or assessing a point guard while discounting the poor shooting of his teammates. The underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league.

In many respects, sabermetrics has dramatically improved personnel decisions. By relying on unusual measurements of performance, such as base runs and plus-minus ratings, teams have been able to identify neglected talent, whom they can sign on the cheap. Sabermetrics has also helped sports executives double-check their instincts. Instead of blindly trusting some errant whim — and thus making a terrible trade or picking the wrong free agent — they can consult the math. If the Giants had trusted the numbers, for instance, they wouldn't be saddled with Aaron Rowand's five-year, $60 million contract. (He batted .230 last year.) They would have realized that his OBS and OPS are pretty mediocre, especially once his two outlier seasons are taken into account.

For a nerd like me, this quantification of sports has been tremendous fun. Thanks to obsessive websites, even the casual fan now has access to statistical tools that would have boggled the mind of a GM 10 years ago. Sabermetrics has also transformed the act of being a spectator, so that watching a game is no longer just about cheering for our hometown team. The numbers have given us a whole new way to think about sports, elevating the conversation beyond disappointed groans, ecstatic high-fives, and subjective opinions.

But sabermetrics comes with an important drawback. Because it translates sports into a list of statistics, the tool can also lead coaches and executives to neglect those variables that can't be quantified. They become so obsessed with the power of base runs that they undervalue the importance of not being an asshole, or having playoff experience, or listening to the coach. Such variables are the sporting equivalent of a nice dashboard. They can't be quantified, but they still count.

This is the moral of the Dallas Mavericks. By nearly every statistical measure, the Mavs were outmanned by most of their playoff opponents. (According to one statistical analysis, the Los Angeles Lakers had four of the top five players in the series. The Miami Heat had three of the top four.) And yet, the Mavs managed to do what the best teams always do: They became more than the sum of their parts. They beat the talent.

Consider the case of J.J. Barea. During the regular season, the backup point guard had perfectly ordinary statistics, averaging 9.5 ppg and shooting 44 percent from the field. His plus/minus rating was slightly negative. There was no reason to expect big things from such a little player in the playoffs.

And yet, by Game 4 of the NBA Finals, Barea was in the starting lineup. (This promotion came despite the fact that he began the Finals with a 5-for-23 shooting slump and a minus-14 rating.) What Dallas coach Rick Carlisle wisely realized is that Barea possessed something that couldn't be captured in a scorecard, that his speed and energy were virtues even when he missed his layups (and he missed a lot of layups), and that when he made those driving floaters their value exceeded the point score. Because nothing messes with your head like seeing a guy that short score in the lane. Although Barea's statistics still look pretty ordinary — his scoring average fell in the Finals despite the fact that he started — the Mavs have declared that re-signing him is a priority. Because it doesn't matter what the numbers say. Barea won games.

I'm thinking here of a Philip Roth metaphor. When asked by David Remnick, in a 2000 New Yorker profile, how he felt about a cramped literary interpretation of one of his novels, Roth busted out a sports analogy. He imagined going to a baseball game with a little boy for the very first time. The kid doesn't understand what's happening on the field, and so his dad tells him to watch the scoreboard, to keep track of all the changing numbers. When the boy gets home someone asks him if he had fun at the game:

"It was great!" he says. "The scoreboard changed thirty-two times and Daddy said last game it changed only fourteen times and the home team last time changed more times than the other team. It was really great! We had hot dogs and we stood up at one point to stretch and we went home."

If that little kid were around today, he'd be obsessed with sabermetrics. He'd almost certainly win his fantasy league, but he'd miss the point of the game. Sure, he wouldn't have squandered center field on Rowand, but he also wouldn't have started Barea or bet on the Mavs. His car would have way too much horsepower and shitty seats.

Here's my problem with sabermetrics — it's a useful tool that feels like the answer. If we were smarter creatures, of course, we wouldn't get seduced by the numbers. We'd remember that not everything that matters can be measured, and that success in sports (not to mention car shopping) is shaped by a long list of intangibles. In fact, we'd use the successes of sabermetrics to focus even more on what can't be quantified, since our new statistical tools take care of the stats for us. We are finally free to think about how those front seats feel.

But that's not what happens. Instead, coaches and fans use the numbers as an excuse to ignore everything else, which is why our obsession with sabermetrics can lead to such shortsighted personnel decisions. After all, there is no way to quantify the fierce attitude of a team that feels slighted, or the way even the best players can be undone by the burden of expectations, or how Kendrick Perkins meant more to the Celtics than his rebounding stats might suggest.

For reasons that remain mysterious, some teammates make each other much better and some backup point guards really piss off Ron Artest. These are the qualities that often determine wins and losses, and yet they can't be found on the back of a trading card or translated into a short list of clever equations. This is the paradox of sports statistics: What the math ends up teaching us that is that sports are not a math problem.

For a smart dissenting view, check out Bill Petti's critique of my critique. My quick response is that I agree with 95 percent of what he says. I particularly agree with this line:

Data and statistics are not to blame for bad decisions--their misapplication is. Lehrer is trying to get at this point, but his misinformed broadside against sports analytics makes it real easy to miss.

That's exactly right, of course. My sole point is that our newfound reliance on data and statistics naturally leads to their misapplication. Because we're so enamored with the numbers, we tend to undervalue what can't be compressed into numerical form, even as we pay lip service to the lingering importance of intangibles. This is a cognitive bias we all need to watch out for.

PS. A few people have suggested that I erred in choosing the Mavs as my primary example because they relied heavily on statistics during the season. But that was my point! Here is the most "forward" thinking team in the league, and yet when it came to their lineup in the last three games of the Finals they chose to start a player who, at least according to every conventional statistical analysis, didn't look so hot. But Carlisle wisely realized that Barea had other things going for him. As I tried to make clear in the piece, the stats leave lots of stuff out, even as they conspire to convince us otherwise. Perhaps sports executives and coaches are a unique subset of human beings who are someone resistant to the anchoring effect and the overweighting of quantified variables, in which case they needn't concern themselves with the bias I'm talking about. Perhaps. But probably not.

PPS. The best rebuttal of the article comes from Tom Haberstroh, over at ESPN. He makes two main points. The first is that the decision to start Barea was probably due to the influence of Roland Beech, the best known sabermetrician of basketball:

Although we can’t know for sure that Beech was the brains behind Barea’s insertion into the lineup, we know he’s in the middle of every on-court personnel decision. After all, this is why he’s on the staff. Sabermetricians shout the hazards of small sample size so it makes sense that Beech and the Mavericks wouldn’t lend too much credence to Barea’s minus-14 plus-minus over a three-game sample or his lukewarm field goal percentage. The statistics that the Mavericks use are far more sophisticated and granular than the stuff we see on the television or on the internet.

It's possible, of course, that Haberstroh is right. Maybe the Mavs are relying on statistical models that exceed what Beech put together at 82games.com, and that these pro models (unlike every model used by us rank amateurs) statistically demonstrated the value of starting Barea despite his mediocre numbers. But that strikes me as a bit of a black box argument. If you believe in stats, it's not enough to say that Barea must have been the statistically-minded call, when we have no idea what those stats might be. (And when the stats we do have, such as +/-, suggest he wasn't.) The alternative, I guess, is that Carlisle did what head coaches have always done, and started the guy he wanted to start, numbers be damned. I'm still sticking with my argument that a big part of the value of Barea is psychological: that little dynamo pisses defenses off.

Haberstroh's second argument has been popping up on all the sabermetrics sites, but he says it the best:

I have not met a sabermetrician who believes that there is nothing to be gained outside of the realm of analytics. Intangibles exist and they have power, even if sabermetricians have struggled to pinpoint and measure that power. No sabermetrician would honestly believe that they are all-knowing. That is precisely why they have strapped themselves in for this quest for objective information.

And I'm sure all those car consumers, if asked, would say that they care about the car seats, too. But that was the point of my column. Even when we pay lip service to intangibles - and everyone does - those intangibles are still mentally devalued by our newfound reliance on numbers. We can't help it. There's just no way to plug the intangible variable (say, the quirks of personality, or the comfort of a front seat) into our rigorous model. And so what we do? Well, if you believe the decision-making literature, what we often do is neglect that hunch, that sly intuition, that errant feeling, since it suddenly seems so unserious. That's just the way we think.