Is DNA the Future of Computer Storage?

We’re producing more information than ever, and data centers are struggling to keep up. Nature’s original storage system may provide an answer…

HSBC x Wired Consulting | Frontiers Biotech | DNA Storage 2

In 2022, humans created just under 100 trillion gigabytes of data, and that volume will nearly double again by 2025. With this, the pressure to find innovative ways of storing is growing, because traditional methods are bursting at the seams.

But to some scientists, the best answer has existed for billions of years: DNA. If it can store genetic code, can it also store the world’s digital information?

“Synthetic DNA has the potential to store orders of magnitude more data than today’s devices, and in a manner that promises to be much more sustainable,” says Karin Strauss, Senior Principal Research Manager at Microsoft Research and an Affiliate Professor at the University of Washington’s School of Computer Science and Engineering.

Strauss has led research in emerging memory technologies, which have drawn increasing interest as global data volumes have mushroomed. Media that once existed physically now predominantly exists digitally; the world is being photographed from space like never before; businesses are increasingly operating on cloud platforms; scientists are amassing massive research data sets. The billions of connected devices and sensors that make up the Internet of Things are forecast to contribute nearly 80 zettabytes—meaning 80 trillion gigabytes—by 2025.

There is a particular question over where to store archival data—information that is infrequently accessed again after creation. Companies are incentivized to store rather than junk this kind of information. Partly that’s because the public doesn’t expect data to disappear. But a more significant driver is the rise of artificial intelligence and analytics. The more data you have, and the higher quality it is, the more powerful your algorithms can be.

There are various conventional ways to approach this problem. Tape drives consume minimal energy, but accessing the data is slow and preserving it is costly. Solid state and hard disc drives are an appealing alternative because of their low access latency, allowing for prompt retrieval of specific data from storage. But because both of these storage mediums have finite lifespans, they require the periodic transfer of data to newer storage media. This recurrent data migration process contributes significantly to environmental waste, as both hard drives and tapes are typically destroyed after use. Solid state and hard drives also require continuous power to maintain their storage and retrieval functions. Significant portions of today’s archival data is stored in vast data centers packed with them. Not only does this take up a great deal of physical space, but they also produce vast amounts of greenhouse gas emissions.

And still we can’t build them quickly enough. “The capacity of existing storage media is not keeping up with the ever-growing demand for data storage,” says Strauss. This has ignited a search for a more efficient storage medium, particularly for archival cloud storage.

The idea of stashing digital information on synthetic strands of DNA has existed since the 1960s, inspired by the fact that DNA is itself a storage system. It is made up of chemical building blocks called nucleotides, each of which is composed of a sugar group, a phosphate group, and one of four nitrogen bases. Each of these nitrogen bases is identified by letters: A (adenine), T (thymine), G (guanine), and C (cytosine). It is the order and sequence of these nitrogen bases that determines the biological messages in any strand of DNA.

Digital information exists as binary code, and DNA storage works by translating its zeroes and ones into sequences of those four letters. For instance, 00 might equal A, or 10 might equal G. Synthetic DNA can then be produced which contains that sequence. This DNA can be stored and, at a later stage, decoded into text, say, or video.

The appeal is that DNA can store massive quantities of information at a high storage density, around one exabyte (one billion gigabytes) per cubic inch. DNA is also durable—it can last tens of thousands of years—and doesn't consume vast amounts of energy.

“It would take billions of tape drives—the current densest commercial storage media—to store tens of zettabytes of information,” says Strauss. “Whereas it would take the footprint of one small refrigerator if stored in synthetic DNA.”

Research into the idea has exploded in recent years. Various companies are developing the technology, some working on synthesizing or reading DNA, others on translating binary code into the DNA alphabet. In 2020, Microsoft co-founded the DNA Data Storage Alliance, bringing together 41 organizations with the twin aims of realizing the potential of DNA storage and recommending the creation of specifications and standards to aid interoperability. And there have been proofs of concept. Scientists have already encoded books into DNA and, recently, a startup released a credit card-sized device that can store a kilobyte in DNA form.

For businesses, the potential rewards are significant. The global data storage market size was valued at $217.02 billion in 2022 and is projected to reach $777.98 billion by 2030. The emerging DNA storage market will hit $3.34 billion by 2030 according to one recent report.

DNA storage is not going to replace traditional data centers, especially as that data is required quickly. But eventually it might enable archival data to be stored in greener, more compact data centers, which produce minimal waste and carbon emissions. In these centers, files will be encoded and synthesized, and then stored in capsules. To read them, a robotic arm will remove a capsule, read its contents, and place it back.

If that sounds a way off, that’s because it is. DNA synthesis remains expensive, and therefore its uses are limited to when you have a small bit of extremely valuable data. Additionally, there is a need to increase how much data can be written simultaneously by a single device, something that Strauss herself has explored with a team of Microsoft and University of Washington researchers. In 2021, they demonstrated that it is possible to reach reasonable write speeds, and she expects those to improve further in the future. “Technologies to write data to synthetic DNA are improving quite rapidly, and recent developments—such as a nanoscale DNA storage writer we developed with University of Washington—show paths toward commercial-scale DNA data storage,” says Strauss.

Strauss and her University of Washington colleagues have also turned their attention to another crucial challenge: How to pick out the desired file from a mixture of many pieces of DNA. They have demonstrated that DNA molecules themselves can actually find images that look similar to an image of interest. This ability would enable the location of files without having to decode an entire database—perhaps paving the way for new kinds of computers altogether. “The ability of DNA molecules to perform computations, alongside their storage capacity, opens up new possibilities for the future of computing,” she says. “Showing that such processes scale to trillions of data items will be the next frontier in applying DNA technology to information technology.”