Oct 30, 2014 7:16 AM

Madhumita Venkataramanan: My identity for sale

Even as you're reading this, your smartphone can reveal your location

Image may contain Face Head Person Photography Portrait Adult and Art

I'm a 26-year-old British Asian woman, working in media and living in an SW postcode in London.

I've previously lived at two addresses in Sussex and two others in north-east London. While I was growing up, my family lived in a detached house, took holidays to India every year, donated to medical charities, did most of the weekly shopping online at Ocado and read the Financial Times. Now, I rent a recently converted flat owned by a private landlord and have a housemate. I'm interested in movies and startups, have taken five holidays (mostly to visit friends abroad) in the last 12 months and I'm going to buy flights within 14 days. My annual income is probably between £30,000 and £39,999. I don't have a TV or like watching scheduled television, but enjoy on-demand services such as Netflix and NOW TV. I passed through Upper Street in north London every day last week. I can cook a little but tend to eat out or get takeaways often; foreign foods (Thai and Mexican) are my favourite. I don't own any furniture and don't have children. I've never been married.

Subscribe to WIRED

I often eat with my university friends on weeknights. I don't care for cars or own one. I dislike any form of housework, and have a cleaner who lets herself in when I'm at work. I shop for groceries at Sainsbury's, but only because it is on my way home. I am not attached to my neighbourhood, and have no contact with my neighbours; I like the idea of living abroad some day. I prefer working as part of a team rather than alone; I'm ambitious and it is important to me that my family thinks I'm doing well. I often go to the pub on Fridays after work. At home, I am far more likely to be browsing restaurant reviews than managing my finances or looking at property prices online. I am rarely swayed by others' views.

This motley set of characteristics, desires, thoughts and attitudes comes very close to defining me as a person. It's also a precise and accurate description of what a group of companies I had never heard of - personal-data trackers - has learned about me.

Earlier this year, I became curious about the personal-data economy. It has grown relentlessly into a multibillion-pound business of tracking, packaging and selling data picked up from our public records and our private lives. As I dug deeper into the world of trackers, it reinforced my anxieties about a profit-led system designed to log behaviour every time we interact with the connected world. I was aware that the data generated by apps and services I use daily - from geolocation and cookies to social-media tracking and credit-card transactions - was building a record of my past.

Combine this with public information such as Land Registry, council tax and voter-registration data, daily location routes and social-media posts, and these benign data sets reveal a lot -- such as whether you're political, outgoing, ambitious, pessimistic, uptight or a risk taker.

Even as you're reading this -- you may be sedentary, but your smartphone can reveal your location and even your posture -- your life is being converted into such a data package; once it has been compiled into lists (interested in technology, subscribes to magazines, probably male, professional, high earner) by intermediaries known as data brokers, it's sold on to data aggregators and analysts and eventually any company.

Ultimately, you are the product.

Under the EU's Data Protection Directive, implemented into UK law in 1998, personal data can be sold to third parties only with your consent and once it has been stripped of your name and any unique identifiers such as National Insurance number. Data is "personal" when third parties are able to link the information to an individual, even if the person holding the data cannot make this link. Simple examples of "personal data" are: full address, credit-card number, bank statements, criminal record etc. Third parties can process the data for their own interests as long as the data subject has consented (this can be assumed, not explicit) and can access or rectify any incomplete and inaccurate entries (Article 12).

But particulars such as your postcode, age and gender can be traded -- because they are not personal, but "pseudonymous", defined by the EU Data Protection Regulation (a draft law that has yet to be adopted) as "personal data that cannot be attributed to a specific data subject without the use of additional information".

But the data business has outgrown the directive. With the globalisation of data flows, there is so much additional information available that the risk of "pseudonymous" data being identifiable has multiplied. In the hands of commercial organisations with basic statistics skills, subsets of data can be unlawfully cross-referenced with other data points about you, to identify you with ease.

"Removing someone's name [from a list] to make them anonymous, that's an idea that went out in computer security 15 years ago," says computer scientist Joss Wright, from the University of Oxford's Internet Institute, who focuses on data anonymisation and privacy-enhancing technologies. "If I know where you live and your salary and how many children you have and your medical conditions and your location patterns, then what's in a name?"

In other words, truly anonymising data about an individual is much more difficult than removing their name. In fact, the more data points collected for an individual, the more likely their record is to be unique. According to Pew Research, an average adult Facebook user has 338 friends - that's at least 338 columns or "dimensions" of data per user.

A data set of your mobile-phone locations over an hour will have over 500 dimensions (a phone beams its location to a cellular tower every seven seconds or so); health data, depending on how many types of records you have, could have thousands of dimensions; and genetic data is one-million dimensional (we have about 1.3m genes in total). Once this is matched with any other data set that contains constants such as your age, sex and address, you've been found. In other words, unexpected information can become "personal" when combined with enough other relevant bits of data.

Latanya Sweeney, director of the Data Privacy Lab at Harvard University, has shown that roughly 87 per cent of people in the US can be uniquely identified by the combination of just three facts about them -- zip code, age and sex. Given that Sweeney was referring to a five-digit zip code in the US, which addresses about 320 million US citizens, rather than the 65 million UK citizens serviced by our longer postcodes, this likelihood is far higher for UK citizens.

Sweeney had already proved this was possible in 1997 when William Weld, then governor of Massachusetts, supported the commercial release of 135,000 anonymised medical records of state employees and their families. Each patient was assigned an ID, instead of a name and social security number,and their record contained their gender,zip code and birthdate along with more than 100 fields of sensitive information about prescriptions and treatments at hospitals. For $20, Sweeney purchased the voter-registration records for Cambridge, Massachusetts, containing the name, address, zip code, birthdate and sex of each voter, and cross-referenced it with the anonymised hospital records.

The result: she pinpointed Governor Weld’s health records with ease. Only six people in Cambridge shared his date of birth, only three of them men and,of these,only he lived in his zipcode. The linking of health and demographic data is already occurring. The NHS produces a “pseudonymised” set of records called the Hospital Episode Statistics (HES) database. This contains every instance of a patient in England using a hospital-based service since 2001, covering 47 million patients, identified by date of birth,gender and address. “So you might have two databases each with age, postcode and sex included, along with several other features. You can then use those three values to identify the same individual in both data sets,” explains Joss Wright.

The UK Health and Social Care Information Centre (HSCIC), which maintains this list, publishes a quarterly register of the organisations with which it has shared the data, including consulting firm McKinsey, BUPA and pharmaceutical firm Astra Zeneca. In July, it made an interesting addition: Experian – a global information-services company that employs 17,000 people in five countries and had revenue of $4.7 billion (£2.8bn) last year; it provides credit-reporting and, marketing services and acts as a data broker, collecting personal information on citizens, companies, vehicles and insurance in the UK, US, Brazil and elsewhere, including historical postcodes, gender and lifestyle habits.

Experian combines the hospital data with its marketing product Mosaic, which provides 66 detailed categories of people living in specific postcodes (O64– my category – is called Bright Young Things. It refers to educated young professionals in urban flats). Because HES data contains a partial postcode and birthdate, plus gender for most patients, this is cross-linked with Mosaic to understand health problems of people in specific postcodes. “In terms of the HES information, pseudonymised data is provided to Experian–this is information that has had key patient identifiers removed,” says Experian’s spokesperson Nishma Shah. She did not say which demographic identifiers remained.

Although it will be without our names and phone numbers, our mobile-location data is also currently being tracked and sold “anonymously” – but with a unique MAC number attached. “In the last couple of years, an entire industry has developed around monitoring the signals your Wi-Fi radio gives out, and your distinct device can be recognised as you’re walking down the street,” says Seth Schoen, chief technologist of the Electronic Frontier Foundation. You don’t even have to be connected to a Wi-Fi network; your phone just has to be on. Canada-based location-analytics company Viasense uses these Wi-Fi sniffers to pick up your location patterns and MAC number. Viasense tracks the locations of six to ten million devices each day inToronto, and builds lifestyle categories based on owners’ movements (people in parks between 6am and 9am are classified as joggers),which CEO Mossab Basir claims it can track down to the square metre.

Telecoms providers such as Vodafone, Telefonica, EE and Verizon have disclosed that they sell anonymised location data (without MAC numbers) packaged into user categories to retailers who want to know their customers' footfall patterns. Free apps are the most effective way for third parties to acquire your locations. "Apps likeFlashlight,Mirror, the gaming apps -- any that ask you for your location are sending it back to the developer continuously, who can sell it to advertisers," says Schoen. "That's why they're free."

In March 2013, researchers from the MIT Media Lab studied 15 months of anonymised locations, donated by an unspecified mobile provider, of 1.5 million people in an unnamed European country. They had customers' location stream on an hourly basis, but didn't know who they were. "We found that we needed just four approximated places and times to uniquely identify 95 per cent of people. About 50 per cent could be identified from just two points," says Yves-Alexandre de Montjoye, first author of the resulting paper and PhD student at the Human Dynamics lab in the Media Lab.

So if you had even four pieces of information that placed someone at a particular place at a certain time – gleaned from social-media posts or public records such as the press – you could untangle their location history in this data set for an entire year. “Even with location information that is a lot less accurate (at a resolution of eight hours, for instance), only a few more points are needed to be able to identify individuals,” de Montjoye says. “Your mobility trace is very personal. We found that it’s actually easy to identify.”

If you think you don’t care about being unmasked, you may want to reconsider. Personalised browser ads may be harmless, but allowing disparate aspects of your life to be combined and identified could have unexpected consequences. “The issue is: are decisions, such as whether I get to go to a particular university or whether I end up getting insurance cover, being made about me using data I never intended to give to a third party – for example, my lifestyle or a family member’s ailment?” says Tim Sparapani, privacy lawyer at the Apps Alliance and ex-director of public policy at Facebook. “This is not about the future –it’s already starting to happen.”

Earlier this year, Cambridge University security engineer Ross Anderson found that NHS hospital data had been sold to, among others, the Institute and Faculty of Actuaries, which has used the information to calculate patients’ risk of developing critical illnesses, and so “improve the accuracy of pricing” (read: increase insurance cost). “The Department of Health was selling off every NHS hospital patient’s records for insurance purposes,” says Anderson.

“They claimed it was non-sensitive because it had been anonymised, but that’s false.” The institute’s spokesperson, Annette Heninger, confirmed the data sent over.“ The IFoA paid £2,200 as a processing fee for the HSCIC to provide the data. It contained the first part of the postcode, gender and age of every patient at the start and end of hospital admission (no dates of birth) and up to 20 different categories of diagnosis.” The actuaries cross-referenced this data set with lifestyle information on patients’ postcodes from Experian’s Mosaic database. They found that levels of critical illness among most customers below the age of 50 were higher than previous calculations had found. As a result,insurers were likely to raise insurance premiums for this group, according to Anderson. “We let the facts and the numbers speak for themselves,” says Heninger, from the institute. “Any member of the public is able to… make their own decisions about the research and the implications of that research.”

Although this revelation caused a furore in the press, the NHS decided to push ahead with the next phase of its open-data plan – the release of all NHS England’s GP data in a similar form to the hospital data, in a data set called Care.data. The NHS pledged to inform citizens by writing that their data would be made public; leaflets did not always reach their targets. “Most of it was thrown into junk mail, and we had to meet with the Secretary of State to ask for it to be an opt-out process,” says Sam Smith, cofounder of UK-based medical privacy non-profit MedConfidential.

The leaflet reads: “We will use information such as your postcode and NHS number to link your records from these different places [GP practice, hospitals and community services]. Records are linked in a secure system so your identity is protected.” It also states: “We sometimes release confidential information to approved researchers, if this is allowed by law and meets the strict rules that are in place to protect your privacy.

”The data will include referrals, diagnoses, NHS prescriptions, your family history, vaccinations, blood-test results, BMI, smoking/alcohol habits etc – dating back to April 2013. The information – which will exist for every person who uses the NHS in England – will be extracted as a series of codes, not as words or sentences. “This granular data set would include linked data of every time any individual in the NHS went to their GP,” says Smith. “That’s basically all the medical history for everybody in England, forever.”

Identifiers such as date of birth, postcode, NHS number and sex are required in order to link the GP data with the data from hospitals. But these details will be modified for privacy reasons when linked to other data sets, and the identifiers will be stored separately, according to the HSCIC. Critics believe that HSCIC retains, at the very least, a look-up table for these identifiers. Because this is a linked, long-term data set pertaining to individuals, patients are very vulnerable to being identified.

“We were told this data would be anonymised. This is a really stupid thing to say. Really stupid. Because the degree to which you can anonymise data is in inverse proportion to its usefulness. Almost exactly,” Conservative MP David Davis tells WIRED. “I have broken my nose five times and I’m 65. Once you know that, you’ve reduced the pool to about 100 people. Then you check the data for when I had my diphtheria jab, usually done at birth, and that’s it, you got me.”

Both the HSCIC and NHS claim that Care.data won’t be shared with commercial organisations and will only be used for research purposes. But in April this year the HSCIC publicly disclosed that HES data had been given (it claims fees were for processing the data) to over 1,000 firms. Of these, at least five, including IT company OmegaSolver, were given a commercial-reuse licence, according to Sam Smith. “This means they can sellthe data on,” he says. NHS data was used by OmegaSolver to help pharmaceuticals sell their drugs. The company produces a dashboard that visualises the medical history of 163, 316 patients from the HES dataset.

An example on its website captured by MedConfidential shows female patient OS060900, aged 81-85, who had five conditions diagnosed in October 2010. She had 257 hospital visits, mostly as an outpatient, but also had a five-day stay at which point eight conditions were diagnosed. If OmegaSolver had her postcode too, it could easily match this with electoral records for her. The company declined to comment.

Care.data was put on hold in February, but will be restarted soon across roughly 500 GP practices, according to NHS England. Tim Kelsey, NHS national director for patients and information, was unavailable for comment.

American bank Capital One is using personalised data to decide which financial products to show first-time customers. It uses the services of New York-based data-tracking firm [x+1], which claims it can determine personal details (not names) of a website visitor within 200 milliseconds.

According to cofounder Ted Shergalis, it analyses information about users' devices, browsers and location through IP addresses; it also buys postcode data and profiles about users' hobbies and interests from online tracking companies and data brokers.

In 2010, the Wall Street Journal approached [x+1] for a piece on how its customer profiling worked. Capital One allowed [x+1] to identify two volunteers -- Thomas Burney, with no children, a graduate who worked in management, owned his home; and Carrie Isaac, a young small-town mother with a midscale income. The Journal's exposé showed how, based on [x+1]'s assessment, Capital One promoted to Isaac some of its least generous cards, while Burney saw only one, the Capital One Prestige Platinum, which included no initial interest or annual fee. Both participants had assumed they were offered the only available options, and were unaware of the tailoring.

During an average day, I wake to my phone, fumble for it on the bedside table and take in the data dump. Today, there are 12 emails, three tweets, four Facebook notifications, two Endomondo prompts and a reminder about a meeting.

The data deluge is everywhere I go – a row of 12 recycling bins with display screens are lined up near Bank station; a screen in the forecourt of a Tesco in Southwark advertises products all day; the Godiva chocolate shop on Regent Street arranges its window display as a form of advertising. Every step of the way, strangers are tracking what I do and drawing a richly detailed profile. When I log data into Endomondo, the app accesses my sex, location and date of birth to share with advertising company DoubleClick, according to cofounder Mette Lykke.

Other health apps do it too: according to independent research conducted for WIRED by Princeton University researcher Ian Davey, WebMD for iOS leaks a large amount of potentially sensitive medical data. When a user views pages for specific drugs, conditions or procedures,the identifiers and sometimes names of what was viewed (along with a unique phone identifier) go to Scorecard Research, Nativo, Adobe Omniture and AOL’s subsidiary Gravity. “Any data is subject to confidentiality obligations and can only be used by these contractors to provide services to WebMD,” says spokesperson Catherine Daniels.

As long as my phone is on, my movements can be tracked. Until August 2013, the bins at Bank tube station were picking up phones' unique MAC numbers via their Wi-Fi, without the knowledge of phone owners - a technology built by London startup Presence Orb and implemented by Renew London. While it was tracking passers-by, Renew - now in administration - would know if I was the same person who passed by yesterday, my specific route and how fast I was travelling.

That's exactly what the sensors in the Godiva shop do. ShopperTrak, a Chicago company, counts the people passing by Godiva via their phones, and modifies the display accordingly. According to Russell Evans, VP of global marketing, ShopperTrak also uses in-store Wi-Fi sensors to track customers' phones, so it knows if they came back.

The screen in Tesco's forecourt has a camera installed by London-based Amscreen, as it has in 500 stores around the UK. "The screens are not just passive TVs -- they are like giant mobiles with sim cards, 3G and GPS. The algorithm, made by French company Quividi, knows someone is standing in front of it and figures out your sex and age, while also recording the time and location," says Mike Lemmings, head of marketing and product development at Amscreen. Demographic patterns can then be sold to advertisers. Tesco now knows I am a 26-year-old female and, if combined with my Clubcard data, it could find out what I buy and where I live, as well as my demographic details.

I decided to piece together what these companies really knew about me. I spoke to Eyeota, a data-analytics profiler with offices in Singapore, Berlin and Sydney. It has -partnerships with a range of websites, so when I visit their pages, it can place a cookie on my browser. It also buys data from Experian to enhance each cookie profile. Using my browsing activity, Eyeota assigns my cookie up to a thousand attributes ranging from sex and region to type of job, whether I have children, own a car, like to buy Star Wars -- memorabilia and so on. It never finds out my name, but it knows more about me than my neighbours do.

My Eyeota cookie knows that I'm a 26- to 35-year-old female, working in the media/internet industry.

I'm interested in entertainment, particularly movies, and in entrepreneurship and startups. I intend to buy flights in the next 14 days.

Eyeota also buys data from Experian's Mosaic database: a collection of 15 demographic groups and 66 lifestyle types based on your postcode and calculated from a variety of data sources which Experian would not disclose (it could mention only voter-registration records and census records as examples). It matches you up to a type.

My Mosaic type runs to 16 pages, with predictions about everything from my financial circumstances to my view on the world. It includes an accurate description of my ethnicity, age, education, profession, home life and financial circumstances.

Because Eyeota buys this profile, it knows I probably take taxis rather than buses when I get home late, that I am unlikely to visit DIY stores, that I spend a large part of my income on eating out and long-haul, use sites such as Airbnb, and that most of my friends are people I met at university. It can then sell this information to the highest bidder.

Profiling individuals has been a gold rush. "It expanded post-9/11 because governments were trying to prevent the next 9/11," Sparapani says. "This was all to try to predict who is related to whom, engaged in terrorist activity, or laundering money. A new ecosystem was established." Now, the data is traded commercially and sustained by advertisers -- the Interactive Advertising Bureau says revenues in the US last year hit an all-time high of $42.8bn. A large part of it is based on selling targeted user data on Facebook and Google. A range of more sensitive data is now available, leaving us much more exposed.

Now our tax records could be exposed in a similar way to our HNS records. "[In July], we had a meeting in Parliament about a proposal from HM Revenue and Customs to sell our tax records to people in the City and remove names and addresses from it but assign a unique ID," Ross Anderson says. Such a plan to sell off anonymised tax records is described as "borderline insane" by David Davis. "If they're [like] the NHS, the unique ID might be postcode. If all the tax records at a postcode are lumped together, there are only a few houses it could be," says Anderson.

HMRC has already launched a pilot, releasing data, such as names and addresses of companies that are VAT registered, to three credit-ratings agencies, including Experian. "It's really hard to see how this is anything but commercial, frankly. [Companies] are going to make value by adding together tax records, social records, existing databases and health records. They're going to know more about us than we know ourselves," says Davis. "This is statewide identity theft."

And the more data that becomes available, the easier it will be to identify you. "All biometric sensor data sources are going to be pretty easy to re-identify," says Scott Peppet, privacy lawyer at the University of Colorado. "Think about your heartbeat or how you walk, or your pattern of exercise. There's gonna be no one who has the identical patterns to you."

In June 2013, the CIA's chief technology officer Ira Hunt gave a talk at GigaOM's Structure:Data conference in New York about the importance of wearables such as the Fitbit to the security services. He said not only could you infer an individual's sex, height and weight from Fitbit data, but that they were "100 per cent guaranteed to be identified simply by [their] gait".

When contacted by WIRED, Fitbit representative Katie Henry said, "We use Mixpanel (and other analytics providers) to understand how customers use Fitbit. Mixpanel, just like all our third-party service providers, is prevented from using any personally identifiable data for any other purposes."

As the data we generate about ourselves continues to grow exponentially, brokers and aggregators are moving on from real-time profiling -- they're cross-linking data sets to predict our future behaviour. Decisions about what we see and buy and sign up for aren't made by us any more; they were made long before. The aggregate of what's been collected about us previously -- which is near impossible for us to see in its entirety -- defines us to companies we've never met. What I am giving up without consent, then, is not just my anonymity, but also my right to self-determination and free choice. All I get to keep is my name.

This article was originally published by WIRED UK