The Big-Data Interview: Making Sense of the New World Order

Welcome to the Big Data era. A lot of people — most notably computer companies — talk about Big Data these days, but very few people seem to understand what it means. Enter Victor Mayer-Schonberger and Kenneth Cukier and their new book, Big Data: A Revolution That Will Transform How We Live, Work, and Think.
This image may contain Text License Document and Driving License
Victor Mayer-SchonbergerPhoto:Rob Judges

In April 2003, British and U.S. researchers declared the Human Genome Project complete. This decade-long computational marathon marked the first time that anyone had mapped out the sequence of the more than 3 billion chemical building blocks that make up Human DNA.

It was a pioneering breakthrough in computer science and biology. It was also an early “Big Data” problem — a computational challenge that calls for a supercomputer, not an Oracle database, to solve. Welcome to the Big Data era. Today, processing power has advanced to the point where the Human Genome could be sequenced in a day. And with more and more of the world being digitized — everything from Google Street View images to our history of Facebook Likes — a lot of people are talking about Big Data these days.

Enter Victor Mayer-Schonberger and Kenneth Cukier and their new book, Big Data: A Revolution That Will Transform How We Live, Work, and Think.

As the title indicates, Mayer-Schonberger, an Oxford professor and Cukier, an editor with The Economist, are excited by Big Data, but their book is more than simple sideline cheerleader. It’s a nuanced and remarkably readable account of the technological changes that have made the Big Data era possible, and a primer on many of the interesting things that are happening at the intersection of powerful computer processing, machine learning, and data analytics. They cover everything from Google’s thirst for new data to mine, to Steven Levitt’s data-driven analysis of match fixing in professional Sumo wrestling.

We caught up with Mayer-Schonberger and Cukier on the telephone to discuss their new book, which launches tomorrow. We wanted to know if Big Data is really changing our brains — and they gave us a few answers. The following is an edited transcript of that conversation.

Wired: Do you like the expression Big Data? Obviously, it’s the title of your book, but there’s a sense among a lot of people that work in the field that it’s an over-used term.

Kenneth Cukier: The term is very exposed now. There’s no doubt about it. But it’s still very useful for industry as a way to talk about it and to understand it and to think about it.

The name is very imperfect. Of course it is. And the biggest imperfection is that it’s not just about the volume and, for people who don’t know more about it, that seems to be the most over-riding thing, and it’s not.

Wired: You say it’s not just about the volume. What’s it about?

Victor Mayer-Schonberger: It’s not about the volume in absolute terms. Yes, the total amount of data that we’re analyzing and capturing gets much bigger. But what we’re really focused on is that we have more data about a phenomenon relative to the total amount of data that is out there.

[Say] we have 60,000 data items and we’ve only sampled 100…If we get all the 60,000 data items that are out there, that is — in our terms — a lot of data. 60,000 is the number of bouts in Sumo wresting that were analyzed in order to uncover match-fixing, as we describe in the book. That was every single Sumo wresting match over the ten years that were looked at. That is not a sample of 100 or 200.

Wired: You say that the idea of identifying causal mechanisms is a “self-congratulatory illusion,” and that Big Data can destroy this illusion. What did you really mean by that? I think that a lot of people will feel like Big Data analytics will take away some of their humanity. Do you agree?

Mayer-Schonberger: Or gained it. [Daniel] Kahneman, in his book Thinking, Fast and Slow, makes that point that humans tend to come up with heuristic explanations of causes of things around us all the time, but most of the time, these very quick heuristic causal explanations are wrong. We eat at a restaurant, we fall sick the next day, we think it was because we ate at the restaurant. More often than not, it has nothing to do with the restaurant. It has to do with whom we shook hands. Our causal fast thinking makes us believe in quick causal connections.

That is often very troubling. We should be very careful with this kind of fast causal thinking. And Big Data helps us because Big Data says: ‘Take a step back from looking at causes. Look at correlations. Look at the what rather than the why, because that is often good enough.’

Wired: We’re in the early days of applying these Big Data analytics techniques, so maybe it’s a little early for this question, but do you think that this phenomenon is changing the way we think? Are we emancipating ourselves from the shackles of this hardwired tendency to see causation where it doesn’t really exist?

Cukier: One thing that struck me about your question is how we’ve already changed… the way in which we think in a quantified way about everything.

When I talked to people about the book in Britain, I had a lot of university professors come up to me in the arts, and they were all complaining that you actually can’t put forward a grant these days in the arts without being able to quantify what you’re doing. And you’ve got artists — they come up to me and they yell: ‘how am I supposed to quantify my success, I’m an artist?’ They believe that this quest for quantification has gone too far.

Now I’d push back against that. I’d think it’s actually very reasonable that if you’re going to produce something like art, that you try to look for ways to improve it and understand it by, if you will, how many people it reaches, how many times it’s been shared on the internet. If it’s something that has an online compliment of it, that will have an impact.

In the initial stages, what we’re seeing is that in all dimensions of life, people are thinking in a quantified way. The quantified self movement is just an example of that. Research grants is another. Obviously, with policing and the idea of predictive policing, where we have police forces are using algorithms to identify where the likelihood of a crime is going to be and send the forces there.

This is the first wave of the way that we’re watching the wave of big data layer itself on top of all of society.

Mayer-Schonberger: One immediate consequence in this understanding of the power of correlation is a shift in how we make sense of the world. The scientists developed the so-called scientific method. They came up with a theory or hypothesis of how the world would work and then they would go out and collect data to prove or disprove their hypothesis. But what if you don’t know the hypothesis? How can you test 50 million hypothesis? In the big data era, you can shift this around, much like Google did with Google flu trends. They didn’t know which of the 50 million search terms they tested needed to be connected and put in a model to model the spread of the flu, but they were able to find the 45 terms that made the most sense.

So Big Data enables us not to test the hypothesis, but to let the data speak and tell us what hypothesis is best. And in that way it completely reshapes what we call the scientific method or — more generally speaking — how we understand and make sense of the world.

Kenneth Cukier Photo:Doubleshot.tvWired: In your book, you talk about Farecast. They were acquired by Microsoft for $110 million in 2006. And then Google paid $700 million a couple of years later for ITA Software, their data supplier. If you were starting a company today, would you own the data or would you be an intermediary?

Mayer-Schonberger: I would want to own the data absolutely. But intermediaries will fare just as well — if the person or the companies that they license the data from has no other choice but to license the data to them.

Wired: How would that happen?

Mayer-Schonberger: So, take the example of the predictive maintenance data that UPS has. They have a fleet of 60,000. And that’s really helpful, but in order to do really good predictive maintenance, you need to have a couple of hundred thousand cars — maybe a million cars in your database.

They can’t do it themselves. If [FedEx] went to UPS and said ‘Why don’t you give us the data and we’ll pool it together?,’ they have a problem with antitrust and so forth. So if a middle man comes in there and says ‘Give me your data. I’ll do the analysis and give you the results of the analysis,’ that is the very sweet spot for an intermediary to exist.

Wired: How is this changing computer science. Does everyone need to be a programmer?

Mayer-Schonberger: Yes, we sill need a very large population of programmers, but programming will change in the sense that programming will focus more on Big Data and data analytics rather than web user interface or transaction programming as has happened in the past.

At the end of the day, it’s still writing code to manipulate data, but it will have a different application and a different goal.

Illustration: Ross Patton