Building 16 at Facebook headquarters is home to the Fishbowl, Mark Zuckerberg’s private all-glass corner conference room that sits beneath a red vintage sign that reads "The Hacker Company." Not far from the sign -- a very visual proclamation that the social networking giant is eternally intent on building new stuff and improving the stuff it has already built -- you'll find one of the company's most important operations: the News Feed engineering team.
These are the programmers who oversee the Facebook tool that instantly streams all sorts of new information -- including status posts, Likes, links, and photos -- to more than a billion Facebook users across the globe. The team's ultimate task is to make sure your news feed delivers content you're actually interested in. That's important because Facebook wants you to keep using its social network, but also because this stream of information includes ads and other sponsored content, the stuff that makes the company money.
At the helm of this enterprise is Lars Backstrom, a 31-year-old with a computer science Ph.D from Cornell University. "My day job is to improve the quality of News Feed," he says, during a recent interview at Facebook HQ, in Menlo Park, California.
This week, with a paper published on the online academic research site ArXiv.org, Backstrom revealed one of the recent fruits of his labor: an experimental algorithm that analyzes your personal network of friends, seeking to identify your strongest relationships. Developed alongside his former Cornell thesis adviser, Jon Kleinberg, the algorithm is strong enough to independently identity your spouse or romantic partner and even predict when you're headed for a breakup.
>'There is a deep scientific interest in the structure of human ties. Understanding people’s preferences and interests is core in providing an engaging and informative service'
Eric Horvitz
Yes, odds are you've already told Facebook who your romantic partner is -- via your profile page. But this algorithm does much more than that. It's not a party trick. It's a way for Facebook to better understand who you are and, ultimately, serve you more stuff that you wan to see.
Backstrom's research is part of a growing movement at companies and universities to use machine learning and large amounts of online data to better understand human behavior and interactions and interests. "Extending our knowledge about people through the computational lens provided by large scale online services is unprecedented," says Eric Horvitz, managing co-director of the Microsoft Research lab in Redmond, Washington. "These kinds of data analytics are revolutionizing social science and changing our deep understanding of people as social beings."
Some projects will even explore how information that ripples across the web can help us better analyze the effects of the world we live in -- how Google, Microsoft and Yahoo searches can be used to detect drug side effects, for instance, or how social media can predict epidemics. Backstrom's algorithm predicts relationships, and as it turns out, that helps improve the online services that give us all that data in the first place. "There is a deep scientific interest in the structure of human ties," says Horvitz. "Understanding people's preferences and interests is core in providing an engaging and informative service."
What's more, an engaging and informative service can directly translate into profits in the form of improved sales and better advertising, and that means companies like Facebook, Microsoft and Google are doubly interested in this kind of research.
Backstrom's project draws from studies done in the 1980s by sociologist Scott Feld on the organization of social ties (.pdf). But it introduces a new metric that can capture some of the complexity and nuances of social lives -- a metric that could be used to make predictions about people's activities and interests.
Dubbed dispersion, this metric measures how well two people's mutual friends are interconnected. It's a departure from previous "embeddedness" models, which counts the number of mutual friends two people have in common. Dispersion hones in on people who span diverse parts of your life, but who don't fit nicely into siloed, well-defined categories like coworkers, college classmates, and dance buddies.
The kinds of friends identified by dispersion are like an "echo of the person in the center, reaching out to the same places they do," says Kleinberg, the Cornell computer scientist who worked with Backstrom on the project. These friends may not rank highly on other measures of interaction -- such as messages sent and received, profile viewing, or tags in photos -- but they're extremely important people in your life. For example, you might not communicate as often with a cousin as you do with a coworker you see every day, but if your cousin announces on Facebook that she just got engaged, you'd definitely want to know that.
If Facebook knows who your most important friends are, it knows you're likely to be interested in the stuff they post. But based on the behavior of those important friends, it can also better understand what's likely to interest you in general.
Birth of an Algorithm
Backstrom’s project got its start in the summer of 2011. At the time, Facebook was still stationed in Palo Alto, California, just across the street from Hewlett-Packard. Kleinberg was on sabbatical from Cornell, and he had come out to Silicon Valley for a week of brainstorming with his former student and several other Facebookers, including sociologists Thomas Lento and Cameron Marlow and data scientist Itamar Rosenn.
One particular afternoon, the group was sitting in a small conference room named for an 80s rock band -- Bon Jovi or something like that, Kleinberg recalls -- when Backstrom posed a question: What if you could get an algorithm to identify your relationship partner? Your spouse or boyfriend, after all, should be toward the top of the list of people whose content you want to see.
So Backstrom and crew came up with an algorithm and plugged in the networks of more than 1 million randomly selected Facebook users. After some training, the system learned to identify a person's romantic partner, which Backstrom used as a proxy for important friends in a person’s network. The algorithm was about twice as accurate at detecting a person's partner than embeddedness was. (Data for the experiment was labeled, but the identity of the partner was hidden from the algorithm by the researchers.)
What's more, partners that didn't have a high dispersion score were more likely to switch their Facebook status to single. And when the algorithm didn't spot a person's spouse or boyfriend, it usually picked out a sibling or family member -- another type of important person.
That's the important part. "For online services, understanding what interests people and the nature of relationships is important for enhancing the quality of the online experience and creating more engagement over time," says Microsoft's Horvitz.
The thing to remember is that some people don't use Facebook as actively as others. "A lot of people use News Feed, and they don’t like a lot of things. They don’t comment on a lot of things. They’re not providing a lot of signal back to Facebook about what they like to see. They’re more passively consuming their feed, and for those people, it’s hard for us to know what to show them," says Backstrom. Dispersion can help fill that gap.
Specially, Backstrom says, plugging dispersion into the machine learning engine that powers News Feed could help Facebook personalize and organize content, improve friend recommendations, and better suggest friends to invite to events, as well as help users discover more relevant brands, pages, and groups by leveraging its existing entities graph.
Looking at links through the lens of dispersion could help the company understand how "you are different than the typical user and how to adapt your experience to that group," says Kleinberg. That could lead to more interesting -- and personalized -- suggestions. "Our online tools are failing us currently in that we group people and define groups by overarching things and miss other common ground," he says. "It would be nice to enrich the set of dimensions along which people have things in common."
Facebook hasn't yet incorporated dispersion directly into News Feed, though findings from this research have helped the team understand what kind of things to include in the service's ranking algorithms. To bring the project to fruition, they also need to expand it. "This worked on a million people," he says. "[But] there's three orders of magnitude between that and Facebook."