Pack Up Your Data and Leave Whenever You Want, It’s the New Rule of the Cloud

There’s a certain level of trust that goes along with using a cloud-based web application. You upload your photos and your documents so you can access them everywhere, but you also trust that you’ll be able to pull those photos and documents down any time you want. It sounds like a perfectly reasonable assumption, but […]

There's a certain level of trust that goes along with using a cloud-based web application. You upload your photos and your documents so you can access them everywhere, but you also trust that you'll be able to pull those photos and documents down any time you want.

It sounds like a perfectly reasonable assumption, but many web-based services make it difficult for you to export your data. Worse, they'll charge you a fee for the privilege. Some offer APIs -- a bonus if you're technically astute, but a solution that leaves the average user short on options.

To prevent such headaches, Google recently launched the Data Liberation Front, an initiative within the company to ensure every one of its products has a clear, easy option for users to export their data in bulk and take their business elsewhere.

Leading this project is Brian Fitzpatrick, an engineering manager at Google. Brian and his team launched an educational website at dataliberation.org in September where you can track their progress and find instructions for exporting your Blogger blog, your Picasa photos, your Gmail inbox, or whatever service you want to bail on.

It may seem odd as business strategies go, but as a practice, data portability and the trust it engenders are key to fueling the growth of the open web. In the following interview, Brian explains why this concept is especially important now, as more of us are sharing our data not only with Google, but with Facebook, Yahoo, Microsoft and other major players. He also hints at some new export features coming to popular Google products -- like the ability to export all of your Google Docs files in a single, downloadable Zip archive.

Webmonkey: What led to the creation of this initiative within Google?

Brian Fitzpatrick: Even before I joined Google, I heard (CEO) Eric Schmidt speak. And one of the things I heard him say time and time again is, "we don't lock our users in." If they wish to leave, they are free to do so, and they can take their data with them. After I started, the one thing I kept hearing over and over again from the team was that we focus on our users first, and everything else follows that principle.

In talking to other engineers here, I realized that we don't lock our users in. But while the door isn't ever locked, in some cases, it could use a little bit of grease. It's a little stuck.

We asked various product people if they've looked into doing an easy bulk-export type of thing where users can take out their data -- and put it in -- en masse. The typical response was, "Oh, it's been on our roadmap for four or five years, it's just de-prioritized because we have to work on these things our users are demanding." So, it just wasn't getting done.

We decided to start a small team of engineers to do just that -- go around to our various products and help build those systems.

WM: So it wasn't a question of evangelizing data liberation since the product managers were already sold on it, but more of a mission to go install the plumbing?

BF: Yeah, but we're also trying to raise awareness in general. Most engineers don't typically think about data liberation. They're more involved in launching products. But I think it's important because it's a way for our users to trust us more.

WM: How much do you see the Data Liberation project as good policy for Google internally versus good policy for the web in general?

BF: I would love nothing more for other companies to copy-cat us on this. It's good policy because we're in a different world than we were in ten years ago.

If you wanted a piece of software ten years ago, you'd go to the store, buy a box and take it home. If you wanted to try another piece of software, you'd have to go back to the store, buy another floppy and do the whole process over again. There's a huge barrier to trying different things out.

Today, if you want to try something else out, you just type another URL in your web browser. We want people to try our software, and if we're going to encourage people to put data in the cloud and use more cloud-based apps, it's important to show that it should be easy to get that data out as well. I want more people to think about this. It's an important thing, and most people don't think "I want to get my data out," until it's too late.

To be very clear: It's not that Google is just an altruistic, lovable, huggable company. I think we're a good company, but we get a benefit from this. We benefit from the work we do with open web standards, open-source and data liberation. But if you're using a Google product now and you decide to go somewhere else, the easier we make it to leave and take your data with you, the more likely you are to come back and use something we come out with in the future.

There's also the "rising tide floats all boats" analogy -- the more we contribute to the success of the internet, the more we contribute to our own success since we're such a big player.

WM: So, are you taking steps to future proof your products as well? Like in the case of Google Docs, or in the case of feed-based data, are you making sure what's supported today will be supported in 10 years?

BF: We're focusing on open formats wherever possible. So, you'll see things coming out in open-documented Atom feeds, XML feeds.

In the case of Docs in particular, there's something great we're working on at the moment. Right now you can get your docs out one at a time. We're working on a way for you to be able to select multiple docs at once, choose whatever format you'd like -- ODF or MS Word or whatever -- and our server will convert everything for you, create a Zip file and stream it down to you. (Brian says this feature will launch within the next couple of months).

WM: That's great for backups.

This is interesting, too. Last winter, we launched Blogger liberation. When you log in to Blogger, there are options for "import blog" and "export blog." It's a nice, user-friendly experience, an easy download. We noticed some people were exporting their blog every other day -- they were just creating a back up. We have several copies of their blog across several data centers, but these people felt more comfortable having their own copy on their own computers.

WM: There are other Google tools that run back ups automatically, like Picasa, where you can sync your photo library in the desktop app to your album on the web, right?

BF: Right, and we've been doing some additional work with Picasa because we've recognized we can do a better job with syncing things like your photos' metadata.

WM: That's interesting because data portability on the social web isn't only about your data, it's also largely about your metadata -- your tags in Picasa and who they're attached to, who you follow in Blogger and your ratings and comments for their posts. Are those bits of metadata being taken into account?

BF: It's really hard to keep up with the features of individual services and the smaller bits, since they're all so different. I don't know if Blogger is exporting follow data. I know in Reader, you can get a list of the blogs you follow if you export your reading list to an OPML file, but you don't get a list of the posts you've starred. There's some education needed there, and some things that merit more attention. I don't think we have the answer for all that yet.

WM: It also raises the question of interoperability among social sites. There are emerging standards that don't yet have broad support but are gaining steam -- things like Portable Contacts, OAuth, Activity Streams. How much attention is Google paying to making sure its import and export systems play well with smaller social sites who are adopting these new open standards? Versus, say, the attention being paid to bulk data export?

BF: I think that's more relevant to the teams working on products that touch on those standards. Our team currently has a pretty sharp focus on data you create in our apps or that you've imported into our apps -- making it so you can get that out. As far as interoperability, we're obviously big supporters, and anything we can do to make it any easier to build on the open web, we're doing.

For example, on OpenSocial we make it easy for developers to write apps that can be shared among different social networks. Google has also done work with OAuth. But the Data Liberation team is primarily concerned with helping you get your data in and out. It's sort of step one of n steps.

WM: OK, so that's your first order of business. Is there a list of tasks related to data liberation you've lined up to accomplish in your downtime?

BF: One thing we're studying is the fact that your hard drive capacity is expanding way faster than your network capacity. Your hard drive capacity increases by an order of magnitude every four years. So that means by 2017, you'll have a multi-terabyte iPod in your pocket.

Network capacity has only increased a little bit by comparison. Ten or so years ago, you had dial-up, or if you were super-advanced, you had DSL. The network speeds we have now are not a lot faster when paired with the growth of hard drive capacity. So, there are a lot of difficulties when dealing with larger data sets. How am I going to get 20 terabytes from Chicago to Mountain View quickly? I'm going to put it on hard drives and FedEx it.

We're brainstorming about making it easier for people to move that size of data set around, or to gain access to that data.

See Also: