How turning science into a game rouses more public interest

This article was taken from the October 2014 issue of WIRED magazine. Be the first to read WIRED's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.

Chris Lintott first met Kevin Schawinski in the summer of 2007 at the astrophysics department of the University of Oxford. Lintott had just finished a PhD at the University College of London on star formation in galaxies. He was also something of a minor celebrity in the astronomy community: he was one of the presenters of the BBC's astronomy programme The Sky at Night alongside Sir Patrick Moore, and had written a popular science book called Bang!: The Complete History of the Universe with Moore and Brian May, the Queen guitarist and astrophysicist. "I went to give a seminar talk as part of a job interview," Lintott recalls. "And this guy in a suit jumped up and started having a go at me because I hadn't checked my galaxy data properly. I thought it was some lecturer who I'd pissed off, but it turned out to be Kevin [Schawinski], who was a student at the time."

Most galaxies come in two shapes: elliptical or spiral.

Elliptical galaxies can have a range of shapes, from perfectly spherical to a flattened rugby-ball shape.

Spirals, like the Milky Way, have a central bulge of stars surrounded by a thin disk of stars shaped in a spiral pattern known as "arms". The shape of a galaxy is an imprint of its history and how it has interacted with other galaxies over billions of years of evolution. It is a mystery to astronomers why they have these shapes and how the two geometries related to one another. For a long time, astronomers assumed that spirals were young galaxies, with an abundance of stellar nurseries, where new stars were being formed. These regions typically emitted hot, blue radiation. Elliptical galaxies, on the other hand, were thought to be predominantly old, replete of dying stars, which are colder, and therefore have a red colour. Schawinski was working on a theory which contradicted this paradigm. To prove it, he needed to find elliptical galaxies with blue regions, where starformation was taking place.

At the time, astronomers relied on computer algorithms to filter datasets of images of galaxies. The biggest bank of such images came from the Sloan Digital Sky Survey, which contained more than two million astronomical objects, nearly a million of which were galaxies, and had been taken by an automated robotic telescope in New Mexico with a two-metre mirror. The problem was that computers can easily filter galaxies based on their colour, however it was impossible for an algorithm to pick up galaxies based on their shape. "It's really hard to teach a computer a pattern-recognition task like this," says Schawinski, currently a professor in astronomy at the Swiss Federal Institute of Technology in Zurich. "It took computer scientists a decade to [teach a computer] to tell human faces apart, something every child can do the moment they open their eyes." The only way to prove this theory, Schawinski decided, was to look at each galaxy image, one by one.

Schawinski did it for a week, working 12 hours every day. He would go to his office in the morning, click through images of galaxies while listening to music, break for lunch, and continue until late in the evening. "When I attended Chris's seminar, I had just spent a week looking through fifty thousand galaxies," says Schawinski.

When Lintott moved to Oxford, he and Schawinski started debating the problem of how to classify datasets with millions of images.

They weren't the only ones. "Kate Land, one of my colleagues, was intrigued about a recent paper which claimed most galaxies were rotating around a common axis," Lintott says. "Which is indeed puzzling because the expectation was that these axes would be totally random." Land needed more data, which required looking at the rotation of tens of thousands of galaxies. "Out of the blue she asked me if I thought that, if they put a laptop with galaxy images in the middle of a pub, would people classify them?" Lintott recalls.

At the time, Nasa had launched a project called Stardust@home, which had recruited about 20,000 online volunteers to identify tracks made by interstellar dust in samples from a comet. "We thought that if people are going to look at dust tracks, then surely they'll look at galaxies," says Lintott. Once it was decided they would go ahead with the project, they built a website within days. The homepage displayed the image of a galaxy from the dataset. For each image, the volunteers were asked if the galaxy was a spiral or elliptical. If a spiral, they were asked if they could discern the direction of its arms and the direction of its rotation. There were also options for stars, unknown objects and overlapping galaxies.

The site, called Galaxy Zoo, launched on July 11, 2007. "We thought we would get at least some amateur astronomers," Lintott says. "I was planning to go to the British Astronomical Society, give a talk and get at least 50 of their members to classify some galaxies for us." Within 24 hours of its launch, Galaxy Zoo was receiving 60,000 classifications per hour. "The cable we were using melted and we were offline for a while," Schawinski says. "The project nearly died there." After ten days, users from all over the world had submitted eight million classifications. By November, every galaxy had been seen by an average of 40 people. Galaxy Zoo users weren't just classifying galactic shapes, they were making unexpected discoveries. Barely a month after launch, Dutch schoolteacher Hanny van Arkel discovered a strange green cluster that turned out to be a never-before-seen astronomical object.

Christened Hanny's Voorwerp ("voorwerp" means "object" in Dutch), it remains the subject of intense scientific scrutiny.

Later that year, a team of volunteers compiled evidence for a new type of galaxy -- blue and compact -- which they named Pea galaxies. "When we did a survey of our volunteers we found out they weren't astronomers," Lintott says. "They weren't even huge science fans and weren't that interested in making new discoveries. The majority said they just wanted to make a contribution." With Galaxy Zoo, Schawinski and Lintott developed a powerful pattern-recognition machine, composed entirely of people who could not only process data incredibly quickly and accurately -- aggregating the results via a democratic statistical process -- but also enable individual serendipitous discoveries, a fundamental component of scientific enquiry. With robotic telescopes spewing terabytes of images every year, they found an answer to big data in a big crowd of volunteers. Since Galaxy Zoo's first discoveries, this pioneering approach of crowdsourcing science has gained a strong following not only with the general public but also within the scientific community. Today, there are hundreds of crowdsourcing projects involving a variety of scientific goals, from identifying cancer cells in biological tissues to building nanoscale machines using DNA. These endeavours have resulted in breakthroughs, such as Schawinski and Lintott's discoveries on the subject of star formation, that have merited publication in the most reputed scientific journals. The biggest breakthrough, however, is not the scientific discoveries per se, but the method itself. Crowdsourcing science is a reinvention of the scientific method, a powerful new way of making discoveries and solving problems that could have otherwise remain undiscovered and unsolved.

At around the time Lintott and his team were developing Galaxy Zoo, two computer scientists at the University of Washington in Seattle, Seth Cooper and Adrien Treuille, were trying to use online crowds to solve a problem in biochemistry called protein folding.

A protein is a chain of smaller molecules called amino acids.

Its three-dimensional shape determines how it interacts with other proteins and, consequently, its function in the cell. Proteins only have one possible structure, and finding that structure is a notoriously difficult problem: for a given chain of amino acids, there are millions of ways in which it can be folded into a three-dimensional shape. Biochemists know thousands of sequences of amino acids but struggle to find how they fold into the three-dimensional structures that are found in nature.

Cooper and Treuille's lab had previously developed an algorithm which attempted to predict these structures. The algorithm, named Rosetta, required a lot of computer power, so it was adapted to run as a screensaver that online volunteers could install. The screensaver, called Rosetta@home, required no input from volunteers, so Cooper and Treuille had been brought in to turn it into a game. "With the screensaver, users could see the protein and how the computer was trying to fold it, but they couldn't interact with it," Cooper says. "We wanted to combine that computer power

with human problem-solving."

Cooper and Treuille were the only computer scientists in their lab. They also had no idea about protein folding. "In some sense, we were forced to look at this very esoteric and abstract problem through the eyes of a child," Cooper says. "Biochemists often tell you that a protein looks right or wrong. It seemed that with enough training you can gain an intuition about how a protein folds. There are certain configurations that a computer never samples, but a person can just look at it and say, 'that's it'. That was the seed of the idea."

The game, called Foldit, was released in May 2008.

Players start with a partially-folded protein structure, which has been arrived at by the Rosetta algorithm, and have to manipulate its structure by clicking, pulling and dragging amino acids until they've arrived at its most stable shape. The algorithm calculates how stable the structure is; the more stable, the higher the score. "When we first trialled the game with the biochemists, they weren't particularly excited," Cooper says. "But then we added a leaderboard, where you could see each other's names and respective scores. After that, we had to shut down the game for a while because it was bringing all science to a halt."

Foldit turned the goal of solving one of biochemistry's hardest problems into a game that can be won by scoring points. Over the past five years, over 350,000 people have played Foldit; these players have been able to consistently fold proteins better than the best algorithms. "Most of these players didn't have a background in biochemistry and they were beating some of the biochemists who were playing the game," Cooper says. "They also discovered an algorithm similar to one that the scientists had been developing. It was more efficient that any previously published algorithms."

In 2011 a group of Foldit players folded the Mason-Pfizer monkey virus, a protein that leads to Aids in rhesus monkeys. Biochemists had been trying to figure out this structure for over ten years. The Rosetta algorithm had been unable to solve it -- the players managed it in three weeks (their structure was subsequently confirmed by experimental data). "It was a team effort by three players," says Cooper. "They worked as a group, not at the same time, but improving on each other's work." When Cooper told them that they had solved the problem and asked them whether they wanted their individual names on the scientific paper, they declined. Instead, they wanted the name of their group to be included in the list of authors: "Foldit Contenders". "Machine learning is this juggernaut in computer science now,"

Treuille says. "But one of the reasons to crowdsource science is human learning. Humans have this trick up their sleeve, this cycle of experiment, hypothesis and results." After Foldit, Treuille moved to Carnegie Mellon in Pittsburgh where he developed another scientific game called EteRNA, which tasks players to design molecules of ribonucleic acid (RNA). "When players started designing their molecules for the first time, they weren't very good," Treuille says. "But they quickly figured what to do. In six months, for a given puzzle, the best computer design would typically be worse than the worst player design." Not only that, Treuille noticed that the players were having increasingly sophisticated discussions on the message board. "There was a blog post by this player called Chris Cunningham, a graduate student in computer science," Treuille says. "He wrote a meta-analysis of some of the designs that people were posting online. What was pretty astonishing was that I don't even fully understand what they were talking about. They were creating their own technical jargon."

Galaxy Zoo harnesses its volunteers to undertake image- and pattern-recognition tasks that scientists don't have the resources to complete. "For decades, [researchers] in computer science have been suggesting that we should convert big-data problems into games," Treuille says. "Now we can do this on a massive scale."

With Foldit and EteRNA, the best work is not done via a statistical aggregation, but by its most elite players. "A way of thinking about this is that we've searched through 100,000 people and found a sub-set who were amazing at this esoteric task," Treuille tells WIRED on the phone from Pittsburgh. "Our top players are not biochemistry or computer-science graduates, but they are better than any graduates who work on this project." After our conversation, Treuille sends WIRED a video interview with the world's number-one Foldit player. In it, a young woman describes her gaming: "I like to take a messy protein and turn it into a beautifully streamlined structure where everything is symmetrical and things are tucked in and nothing is hanging out. Yes, that's what I like doing," she says. "I'm an admin worker in a rehab team. I'm just answering telephones, working on bespoke computer programs, interacting with staff. It's repetitive. When I go home I become a different person.

I like to measure myself against something and it's given me something that my everyday life hasn't given me, which is to use abilities I didn't know I had."

In 2009, Galaxy Zoo evolved into a larger platform called Zooniverse, which currently runs more than 20 scientific projects for a community of more than a million people. "When we started Galaxy Zoo, we thought we would get a few people from amateur astronomy clubs," says Robert Simpson, a researcher and web developer at Zooniverse, when WIRED visits his offices at the University of Oxford's astrophysics department. "Four thousand people took part. We're starting to run out of galaxies." According to Simpson, the Zooniverse community contributes more than 50 years of effort per annum and over 70 scientific papers have resulted. "We don't let them sleep," he jokes. "It's a very good effort if you work out their holidays and account for the time they need to go to the toilet. We don't just get one person's opinion, we get a vote on everything we ask, and that's very powerful in terms of creating catalogue of things that only a human can find."

Simpson shows WIRED an artist's impression of a planet with four suns. It was found by Planet Hunters, another Zooniverse project. "It's a bit like Tatooine," Simpson says. "This system has two stars in the centre, the planet revolves around them and two other stars orbiting around them. The computers didn't spot it because no one even thought of writing an algorithm to look for this kind of system."

Zooniverse now ranges from archaeology to nature conservation.

Snapshot Serengeti, for example, uses images from 225 cameras placed in a grid across the Serengeti National Park. The cameras are triggered by motion and heat, taking images of animals in various poses. "The idea is to make a census of the large predators and their prey," Simpson says. "Users are asked what species they see and what they are doing. They are mapping migration patterns of these animals over a long period of time." On Simpson's office door, there's a list titled "Army size", with the names of various countries crossed out, apart from India ("1.3 million"), the US ("1.6 million") and China ("2.3 million"). "We just crossed out North Korea. We used to compare ourselves to how many football stadiums we could fill with our members, butwe had to upgrade to armies."

The significance of Zooniverse's popularity is best illustrated by the science projects that it has inspired. Consider Eye Wire, a game launched in 2012 that asks players to map neuronal circuitry. It was developed by Sebastian Seung, a neuroscientist at Princeton University and a former MIT professor who is attempting to comprehensively map how neurons are connected within our brain. This sort of wiring diagram, which Seung calls the connectome, is a complex network made of a hundred billion neurons wired together by an average of 100,000 connections each (see WIRED 07.12). It's a daunting task that Seung has been working on since 2006, when he started developing artificial-intelligence algorithms that could do the job automatically. "We made progress, but years have passed and computers still can't do the job," Seung says. "The way to solve this would be a combination of human and artificial intelligence."

Eye Wire uses scans taken from a mouse's retina. The retina is a part of the eye but it's also composed of neurons which process visual input. The retina sample used in Eye Wire was shaved off in 30-nanometer-thick square layers and each layer imaged by an electron microscope. Each slice presents a cross section of the thousands of neurons, an image not dissimilar to a slice of salami, with the boundaries of neurons represented in darker and well-defined lines. Reconstructing the three-dimensional version of this sliced piece of retina from a stack of two-dimensional slices usually takes about 50 hours, a task that Seung compares to completing an extremely complicated colouring book. "In that image the neurons are all tangled up," Seung says. "To get the shape of a single neuron you have to follow its branches throughout the image. The task is hard even for humans." Because it typically takes tens of hours to get reasonably good at this task,

Eye Wire players first have to pass a test of accuracy and overcome various challenges to get a chance at mapping the most difficult neurons. "We allow everyone to participate, but the idea is to identify the experts," Seung says. "One of our lab technicians plays a character called the Grim Reaper, who oversees the entire cell.

When it's obvious that there's a false branch growing out of a neuron that shouldn't be there, he chops it off. He's correcting the crowd. But some in the community were becoming as good as the GrimReaper, so we promote them to a higher order of players, which we call the Order of the Scythe."

In May 2014, Eye Wire published its first paper, "Space-time wiring specificity supports direction selectively in the retina", in the journal Nature. The paper presented the discovery of a neural circuit in the retina whose function is to detect moving visual stimuli. It was a problem that had eluded neuroscientists for more than 50 years, since it was first described in theory. More than 120,000 have played the game since launch, and 2,183 had their name in the paper because of their contributions. "I don't want to say that there was no other way to do this," Seung says. "But it so happens that we got there first."

One morning in June 2014, Simpson and his team met Andrew Bastawrous, an eye surgeon from the London School of Hygiene

& Tropical Medicine, and Tunde Peto, an ophthalmologist from Moorfields Eye Hospital. Bastawrous has developed a smartphone add-on that can diagnose eye disease (WIRED 08.14), an app called Peek (Portable Eye Examination Kit), which he is using in clinical studies in Kenya.

Simpson and Bastawrous, both TED Fellows, met in March 2014 at TED's conference in Vancouver, and thought Zooniverse might help Bastawrous's project. "Lately we have been joking that we should be doing something more worthwhile than just finding new planets and galaxies," laughs Simpson. "Helping Andrew is in that category of projects that make a difference." At the meeting, which took place at the University of Oxford, Bastawrous showed a picture that he took in a village in Kenya of an elderly woman. Her eyes were cloudy, her face expressionless. "I see ladies like her in the clinic every day," Bastawrous says. The woman had glaucoma, an eye disease that develops slowly but leads to blindness if not treated early. "Unfortunately it was too late. The worst thing is looking into someone's eye and seeing that there's nothing you can do."

Bastawrous needs a way of quickly evaluating the thousands of images that come from eye clinics. "It's relatively easy to spot images that show abnormal features," says Peto, whose team does image analysis for Peek. "But when you see nothing, you spend a lot of time looking for minor features. It's very time consuming." "Getting the Zooniverse crowd to analyse these images is a typical medium-sized project," Simpson adds. "We could get this done in about a week."

On hearing this, Bastawrous is visibly cheered. "I'm excited that it's possible to have thousands of people looking at these images."

Simpson mentioned Zooniverse's Milky Way Project, which allows volunteers to annotate images of regions of our own galaxy. "We find new features based on people annotating an image," Simpson says. "It's a different model for data analysis. In your case, you can get people to annotate the retina."

Peto agrees that method would be ideal to look for new patterns in the eye. "We recently found that certain spot patterns in the retina [can indicate] early-stage dementia," she says. "I gave the images to a bunch of junior graders, who had no idea that these came from dementia patients. They kept marking one feature. I never spotted that pattern because I had strong preconceived ideas of what we would observe."

It was a lesson that Simpson, Lintott and Schawinski also learned from Galaxy Zoo. When Schawinski decided to look for a new signal -- blue elliptical galaxies -- he had a major disadvantage: he was an expert. He had always been told that elliptical galaxies were not blue and, although he found what he was looking for, that bias got in the way. When his team made the data available to the public, the crowd was not only much faster, they were also more accurate.

Zooniverse, Foldit and Eye Wire encapsulate what makes crowdsourcing so powerful. But the possibility of crowdsourcing science is, in itself, a surprising discovery. It's difficult not to be amazed by the fact that crowds of amateurs seem to be able to crack problems, especially when we consider how science is such a highly specialised endeavour, one which allegedly requires years of study and research before the scientist can even hope to make a relatively original discovery.

But crowds of enthusiastic amateurs are not the only ingredient that contributes to the success of crowd-sourced science projects.

If scientists simply made their data and their algorithms available to the public, in the exact same form and format that they are used in the labs, the crowd wouldn't know where to begin.

The key is making it accessible: astronomers turn the problem of morphological classification of galaxies into a game; biochemists re-imagine protein folding as a puzzle; neuroscientists make mapping the brain's connectome into filling in a colouring book. It is only when esoteric scientific problems are divested of jargon, deconstructed into their most basic elements -- and made fun -- that the crowd can come out and play.

This article was originally published by WIRED UK