Your phone's GPS can predict crime

This article was first published in the January 2016 issue of WIRED magazine. Be the first to read WIRED's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.

MIT Media Lab professor Alex "Sandy" Pentland is a computer scientist and psychologist who studies human behaviour through big data: he calls this "social physics". He uses reality mining - collecting data from devices such as smartphones or GPS-enabled apps to analyse how we act and interact. WIRED speaks to the director of the Connection Science and Human Dynamics labs to discuss what big data can reveal about us, and how it could make governments more accountable.

WIRED: Can you explain what work you do at MIT?Alex Pentland: I am interested in technology and society, and how they work together. Right now, the world is becoming datafied: as the world is becoming more digital, we get data from everything, about anything. What I am trying to do is use this data for good. We need the power of data to make things greener and safer, and we want to avoid misuse. I work on data security, data privacy, what you can do with data and, importantly, what data can tell us about ourselves. Data allows us to know a lot about ourselves, but also about our communities, whether they are likely to be full of crime, whether they are likely to be healthy or creative or poor or rich. What it also means is that we can then hold governments much more responsible because you can see all these things.

What data are we talking about here?All digital systems - for instance, mobile phones or credit-card-payment systems - leave behind little traces. There is data telling when [people] use credit cards, how many people are buying in a specific area, what sort of things they were buying, where were they buying them. And, of course, data from mobile-phone networks. So, companies collect that sort of data, whereas governments collect others on transportation and education. I call them "digital breadcrumbs" because they aren't the things that were intended to be measured - but they're something that came along with the thing that was measured. How many people are in this square, how many people are buying food? Things like that.

What tools do you use to analyse them?Mostly simple analysis software, general-purpose computation software. But we also generate our own programs. To look at phone data, we developed specific software called Bandicoot<sup>1</sup>.

How can you use this data to tell things about us?One example: we used it to understand which areas in a city were more likely to have more crime and why. We had transportation records and we had mobile-phone tower records showing how many people were in any given location at each time and where they came from, what was their home town. Analysing data helped us make sense of whether there was a lot of diversity in a place, or if just the locals were there. And it turned out that places that are really diverse are the places with the lowest crime rates. More importantly, if you have a place that used to be diverse and suddenly it's just the locals, something's wrong, people fled. We like to joke that you should send social workers and the police there, because you're going to need them soon.

Is the data from social networks useful, too?I don't tend to use social networks too much, because on social media you tell people what you think they want to hear. It's stuff you tell your friends, and that's interesting, but it's not necessarily true. The way we look at that is that if you know where the social-media data comes from, so again if you have a place that is very diverse - you know that using another source of data - it is also probably a place where social-media posts all sound happy. In that case we know the happiness is real.

How do we tackle the risk of our data being misused?We set up the Data Transparency Lab<sup>2</sup> with the aim of tracking how the data is used, and who is using it. I'm sure you say "Yes" to apps that ask to share your data. Do you know what happens with that data? I don't. But there are ways of telling who is using the data: you sample people, and check what information they give out and you sample people who are using the data, and you try to understand whether [the data's owners] knew that.

You said that data could help us hold governments to account. How?Imagine that, when you run for office, you have to write down the promises you're running on. Then the city appoints a commission to evaluate you each month. Then imagine that every country is going to regularly publish all the data I mentioned, and they show, for instance, that crime is growing. So you could use data to hold the politicians accountable.

Could politicians use data to make better choices?Yes. We still don't understand what policies work, what sort of government is best. We have a lot of stories but we don't know which are true. As you get more data, more experience, you'd go: "Oh! People living this way end up being healthier!" It's the same with crime. There are many theories about crime, and one of them is that if you have a lot of different people from different places there's more of it. Turns out that's wrong. Another theory is that if you have a lot of young people you'll have more crime - turns out that's just a bit true, but not in a significant way. We could get that from just looking at data.

1. There are three Bandicoot categories: individual (eg number of calls, text response rate), spatial (eg entropy of places) and social network (eg clustering coefficient). 2. Mozilla, Telefónica, the MIT Connection Science and the ODI are all involved.

This article was originally published by WIRED UK