Linux Reconstructing Tree of Life

Scientists are using a Linux cluster computer to figure out the relationships of thousands of species on the evolutionary tree. The designer of the rig says Linux makes supercomputing easy. By Michelle Delio.

Reader's advisory: Wired News has been unable to confirm some sources for a number of stories written by this author. If you have any information about sources cited in this article, please send an e-mail to sourceinfo[AT]wired.com.

Demeter, Mother Earth to ancient Greeks, is now helping scientists unlock the mysteries of life.

Demeter is the name of the American Museum of Natural History's supercomputer. Built by biologist Ward Wheeler from off-the-shelf parts, the Linux cluster is now ranked the 107th-fastest computer in the world on the Top 500 supercomputers list.

Scientists are now using Demeter to create "The Tree of Life," a collaborative project involving biologists from around the world.

Funded by the National Science Foundation, the project is attempting to construct a pattern of relationships that biologists believe links all of Earth's present and past species -- from the smallest microbe to the largest vertebrate that existed during Earth's 4 billion-year history.

Wheeler, like many of the other scientists working at Manhattan's Museum of Natural History, is engaged in phylogenic reconstruction -- research that correlates huge amounts of data to establish ancestral connections between species.

Wheeler's complicated research constantly exhausted his computer's processing power. So, ten years ago he decided to build himself a supercomputer.

His first rig was a cluster of 11 Hewlett-Packard workstations, each with 32 MB of RAM and 100-MHz PA-RISC chips. It was a nice setup for the time. But he didn't stop there.

In 1996, Wheeler added 10 Intel PCs running Linux. After completing that in 1999, he built two other Linux rigs from scratch, taking advantage of faster hardware and more memory.

Demeter, the most recent addition, comprises 256 Xeon CPUs, each running at 2.8 GHz, with 4 GB of RAM each. The boards are connected over Myrinet, a high-speed network built for cluster computing.

Wheeler said he opted to build his own computer because he wanted to get the best performance "bang for the buck."

"It's relatively simple to build a computer -- takes 15 minutes and about 12 parts," Wheeler said. "We bought the parts wholesale and assembled it in my office. The next day we connected the computers into a cluster.

"The computer project has certainly grown over the years, but the real innovations that made this possible are the concept of cluster computing and the Linux operating system," Wheeler added. "Linux makes it so easy to create a supercomputer."

Wheeler is now in charge of the Tree of Life's $2.7 million spider project, which aims to resolve relationships among 40,000 species of spiders.

Wheeler's team will work to create a massive data matrix containing detailed information about each spider species -- attributes like size, diet and habitat. The matrix will then be processed to spot subtle relationships between the species, and to figure out how spiders made the transition from their family tree's base to the farthest branches.

Demeter will then analyze the data arrays to arrive at the best tree formation, among many possible trees, that could describe evolutionary relationships.

Without a supercomputer, processing a project like the Tree of Life would be impossible in any reasonable amount of time, Wheeler said.

"One of the most interesting parts of the Tree of Life project is that we combine information from all possible sources, like anatomy and genomic DNA, to create integrated pictures of life. This includes living and extinct creatures, which may outnumber the living 100 to 1. It is fascinating to trace the lineages of these creatures back in time."

Mark Norell, paleontology curator at the Museum of Natural History, will also use Demeter in his own Tree of Life research. Norell's team is working to discover family-group relationships among archosaurs.

Norell's team also will start by creating a data matrix of species and characters for theropod dinosaurs (the archosaur subgroup that includes birds and their dinosaur relatives). The team then will combine that matrix with a companion project on modern birds.

The resulting combined bird and theropod database will eventually be turned into a Web-based "supermatrix" of some 2,000 species. Scientists will be able to export the data for their own analysis.

A book on the project is set for publication in late 2003.