Apr 19, 2016 9:57 AM

Don't own your metadata? Prepare to get owned

This article was first published in the May 2016 issue of WIRED magazine. Be the first to read WIRED's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.

I once received an email for Jeff Pomerantz. Because my name is Jeff Pomerantz, you may not think that's worth commenting on. However, it turns out that there are several Jeff Pomerantzes in the world, and I had received an email for the wrong one. One of my namesakes turns out to be a soap-opera actor, and I had received a flight itinerary that should have been emailed to him. I can only assume that in addition to our having the same name, we also have similar email addresses.

This was a very mild case of mistaken identity, and frankly more of a problem for the other guy than for me. (I hope he made his flight.) But it's easy to imagine more severe cases. You don't have to imagine too hard, because cases abound. In the US, the Transportation Security Administration's No Fly List has something of a reputation for being prone to cases of mistaken identity, preventing ordinary citizens, from congressmen to toddlers, from boarding their flights.

No one is going to mistake a two-year-old for someone likely to hijack a plane. No one will mistake me for a soap-opera actor. This is not mistaken identity. This is mistaken metadata.

Metadata is data about something, some statement about a thing in the world. That thing can be an inanimate object (the title of this publication is WIRED) or a person (my name is Jeffrey). This kind of descriptive metadata is extremely common; so much so that most of us create some every day without even thinking about it, as we name files and write the subject lines of emails and tag YouTube videos. The name Jeffrey is not me, but it represents me; the subject line "funny cat video" is not the content of the email, but it represents that content.

Librarians use tools called authority files to reduce the risk of cases of mistaken identity. I have a friend and colleague named Paul Jones, for example; this is hardly a unique name, and my friend delights in sharing a name with the likes of pirates and rock stars. The Virtual International Authority File, which is a collaboration among several dozen national libraries worldwide, has a separate record for each of the many Paul Joneses throughout history. Librarians are concerned with disambiguating names and concepts, because doing so helps make them easier to use. Other types of organisations may not have so much concern for being user-friendly.

Where libraries leave off in managing this type of metadata, other service providers have stepped in. An industry has sprung up around identity management, which spans everything from controlling logins and access permissions in online systems to controlling an individual's or corporation's online brand. Some services, such as the now defunct ClaimID, allowed you to curate a set of links about yourself - these are about my professional life, these are personal, etc - and to declare another set of links as not about yourself, as for example I would about the IMDb entry for Jeff Pomerantz. Other services such as about.me and Google+ profiles have stepped into the void left by ClaimID.

It's a challenge to manage data about you that you didn't create yourself. The data that's produced as you go about your day-to-day activities is often referred to as "data exhaust". This is a good term for it, as the word "exhaust" captures the idea that this kind of data is the by-product of other processes. And, of course, what's produced can be collected. It's difficult bordering on impossible to control one's own data exhaust, short of dropping out of modern life entirely. Every time we go online, we produce data exhaust. Every time we buy something with anything other than cash, we produce data exhaust. If we even go outdoors where there are security or traffic cameras, we produce data exhaust. And this data may be collected by others, and usually is; others over whom we may or may not have any authority.

Equifax is an American credit reference agency that also does business in the UK. It has a data- management policy that allows you to obtain all of the data that it maintains about you, and correct any erroneous data. Advice from financial planners has of course long been that you should check your credit report at least annually. Less ingrained is that you should check and correct the data that all other services maintain about you. This is easier for some services than others: Facebook, Twitter, LinkedIn and more are based around users' profiles, so make it simple for you to edit yours. Many other services, such as your mobile-phone provider, have policies similar to Equifax's, but just how to avail yourself of these policies may be hidden on their websites.

Of all the metadata about yourself that exists, there's only a limited amount that you can access, let alone control. But what metadata you can, you must. Own your metadata. If you don't, someone else can own you.

Jeffrey Pomerantz is information scientist and senior research analyst at the EDUCAUSE Center for Analysis and Research

This article was originally published by WIRED UK