DeepMind’s AI has finally shown how useful it can be

AlphaFold has provided the clearest picture yet of the human proteome. Now DeepMind is making its work available to the world
Image may contain Accessories Adult Person Strap Headband Head and Face
Getty Images / WIRED

All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links.

Marcelo Sousa, a biochemist at the University of Colorado Boulder, had spent ten years trying to crack a particularly tricky puzzle. Sousa and his team had collected reams of experimental data on a single bacterial protein linked to antibiotic resistance. Working out its structure, they hoped, would help to find inhibitors that could stop that resistance from building. But, year after year, the puzzle remained unsolved. Then along came AlphaFold. Within 15 minutes, DeepMind’s machine learning system had solved the structure.

It’s the kind of result that could soon be repeated in labs across the world. In a paper published in the journal Nature, DeepMind has released over 350,000 predicted protein structures. Included in that is almost the entirety of the human proteome, the proteins that make up the human body. Within these predicted structures could lie key insights into diseases such as cancer and Alzheimer’s, the possibility of new drugs and even better ways to recycle plastic.

To put that number into context, the Universal Protein database, a collection of all the proteins that science has uncovered thus far, contains over 180 million protein sequences. These protein sequences tell us how the amino acids in a protein are ordered, but that’s only the beginning of the puzzle. To really understand how proteins function in the body, we need to know how that sequence determines the 3D structure of the protein – and that is a much more difficult task than simply knowing the right order of amino acids.

Of those 180 million protein sequences, scientists have so far worked out the structure of just 180,000 proteins. DeepMind’s new database provides predictions for more than double the number of known protein structures to date. Now biologists will be able to work on understanding how proteins interact and function – and beyond that, designing new proteins, enabling quicker drug discovery, deciphering disease-causing gene variations and more. “There’s much more to proteins than structure, and so we need to bring it together,” says Janet Thornton, a director emeritus of EMBL’s European Bioinformatics Institute. “It’s one component in that broader understanding of how life works.”

In the coming months, the AlphaFold team plans to release 100 million protein structures. “We’ll go from protein structures being a very precious resource to [them] dropping at every street corner,” says John Jumper, AlphaFold lead researcher. 

AlphaFold cracked the protein folding problem back in December 2020, when the DeepMind team won at CASP, the Critical Assessment of Protein Structure Prediction. At the time, the company promised it would make the data and code openly available. Less than eight months later, in July 2021, DeepMind published AlphaFold 2’s full code and methodology in Nature, and now it has announced that it will all be free to use through a partnership with the European Molecular Biology Laboratory (EMBL) in order to share this massive resource, which will be called the AlphaFold Protein Structure Database. “We believe that this represents the most significant contribution AI has made to advancing the status of scientific knowledge to date,” DeepMind’s CEO and co-founder Demis Hassabis said at a press briefing.

All living things on Earth are made from proteins – simple strings of amino acids that fold up from a linear chain into complex, compact 3D shapes. A protein can fold in a near infinite number of ways before reaching its final structure. In 1972, during his Nobel prize acceptance speech, Christian Anfinsen proposed that the structure of protein should be determined by its amino acid sequence. But proving that was a whole different ball game, and the protein folding problem has been a headache that has plagued and puzzled scientists for 50 years. 

Traditionally, research has relied on expensive and time-consuming methods to work out structures, such as X-ray crystallography and electron microscopy. It can take from a few months to a year for a biologist to crack the puzzle; some have invested their whole PhD on trying to solve a single one. “Even then, success is not guaranteed – some proteins are notoriously difficult to find structures of,” says Pushmeet Kohli, head of AI for science at DeepMind. With this new database, for a huge amount of proteins, any researcher will be able to get its structure in mere minutes. 

In its latest paper, the DeepMind team has shown AlphaFold in action, applying it to predict the structure of 98.5 per cent of human proteins. The team have also included the structures of the proteomes of 20 key model organisms important for biological research, such as the fruit fly and E.coli. 

In order to guide researchers wanting to use the protein structure predictions in their own work, the team have provided confidence measures – labelling which predictions they have deemed to be the most reliable. Low confidence in a structure leaves researchers fumbling in the dark. But providing confidence metrics means that scientists will know which ones to rely on, and which predicted structures need to be double-checked using other methods. Alphafold managed to predict over a third of residues – the amino acids that make up a protein – in the human proteome with very high confidence, and almost 60 per cent fall into the next highest confidence bracket. Putting the two brackets together, the system can predict the shape of the protein to near-experimental accuracy about two-thirds of the time. Before, despite years of research, only 17 per cent of the structures of the human proteome’s amino acids had been experimentally determined. 

There are certain protein regions where AlphaFold could only provide a low confidence prediction, but the team still thinks this is an important finding, as opposed to a failure of the technology. When Jumper and his colleagues first started seeing this result, they panicked, says Jumper. But when they looked closer, they realised that these structures were in fact proteins that were known to be intrinsically disordered. “It has no fixed structure, and that’s why you get no answer. And that’s valuable for experimentalists,” says Jumper. 

As was the case with Sousa, DeepMind has been leasing out its database to other researchers for some time. John McGeehan, a professor of structural biology at the University of Portsmouth, who is searching for enzymes that can biodegrade single-use plastics, used AlphaFold to test his team’s crystal structures against the predicted structures that AlphaFold returned. He found that they were not only identical but also contained even more information than the crystal structures were able to provide.

AlphaFold won’t entirely replace the use of experimental methods to determine structures, but rather the two will complement each other. For one thing, the areas where the prediction is not as confident will require other means to solve a protein’s structure. “I don’t think that we’re just yet at the point where we can just take the predictions at face value and assume that they are correct,” says Sousa. 

The success of AlphaFold in this paper may not come as a big shock to many scientists; rather, more as confirmation of the already-suspected capabilities of such technology, says Andrei Lupas, the director of the Max Planck Institute for Developmental Biology and an assessor at CASP. Similar systems are following close behind. Academics from the University of Washington have already designed a protein prediction tool similar to AlphaFold 2, called RoseTTaFold. “I would say that by the end of this year, we will have several high performance protein structure predictors available,” says Lupas. 

There may also be some scepticism amongst the structural biology community. The predicted structures are, after all, predictions, and the confidence levels can vary. “For structural biologists, I don’t think they will ever be out of a job, because they will be wanting to verify that these structures are right,” says Andrew Martin, a professor of bioinformatics and computational biology at University College London and former CASP entrant and assessor. “It’s clearly a huge advance over everything that’s around at the moment, but nonetheless, it’s not necessarily the final answer.” 

Fundamentally, the news shows that this is something that AI can just do better. “We’re rubbish at predicting protein structures,” says Jumper. Marrying machine learning and biology doesn’t just mean doing something better, it means doing something that humans can’t do at all. 

More great stories from WIRED

This article was originally published by WIRED UK