Nov 27, 2014 7:00 PM

How to massage statistics

This article was taken from the January 2015 issue of WIRED magazine. Be the first to read WIRED's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.

Gary Smith, professor of economics at Pomona College in California, says computers have made it much easier to fiddle with numbers. "We don't see the eight hours spent massaging them -- we just see the published result," he says. His book Standard Deviations (Gerald Duckworth & Co) breaks down the tricks of statistical manipulation.

Find random correlations

"If you look at two things that increase over time, it could be beer sales and marriages, shoe sales and births, these things will tend to be highly correlated -- say a correlation of 0.9 or higher," says Smith. "It's not necessarily the case that these two things have anything to do with each other." Any two things that grow as the population increases can be correlated, but this doesn't show that increased drinking causes more marriages, or that more marriages leads to increased drinking.

Create a theory around the data

"This is something common to self-improvement and management books," says Smith. "You take a selection of, say, successful marriages, look at what they share, then claim these things are the key to their success. But you'll always find something in common."

The right thing to do is to make a prediction as to what will be shared. Then take a selection of newlyweds, some who have these and some who don't, and look forwards to see if your prediction pans out.

Prune the data

"This works if you already have a theory that you want to support and you find the data doesn't quite do what you need," says Smith.

Researchers at the University of California were taken with the idea that Chinese and Japanese aversion to the number four might lead to higher incidences of heart attacks on the fourth of the month. The data showed no relationship, but the researchers cut out various categories of coronary death, reporting only a selection that supported their claim.

Ignore self-selection bias

"In the social sciences our experiments often involve categorising people, observing their behaviour, and then trying to draw conclusions from this," says Smith. The problem is that people aren't truly randomly assigned to these groups. For example, the reason a graduate's life turns out a certain way could be because they are the sort of person who makes the choice to go to university, rather than because of their going to university itself.

This article was originally published by WIRED UK