Computer Groupthink Under Fire

U.S. policy makers find themselves in the middle of a debate on how best to approach supercomputing research, as critics charge that too much attention is paid to clusters rather than true supercomputers. By Michelle Delio.

Reader's advisory: Wired News has been unable to confirm some sources for a number of stories written by this author. If you have any information about sources cited in this article, please send an e-mail to sourceinfo[AT]wired.com.

Clusters are cool, and grids are great, but neither one can replace a real supercomputer.

Critics at a House Science Committee hearing in July on the status of supercomputing in the United States claimed that federal agencies are focusing too heavily on developing and deploying grid computing and clusters, and not investing enough in development of true supercomputers.

At times, the committee hearing, intended to be a simple status report on the state of supercomputing, turned into a geeky and heated discussion on the relative benefits of supercomputers vs. the less-expensive grids and clusters.

Experts at the hearing also pointed out that the United States has fallen behind Japan in supercomputing. NEC's Earth Simulator in Japan is now the world's fastest supercomputer, according to the 500 fastest supercomputers rankings.

Although the next five fastest supercomputers are all in the United States, the second-fastest machine in the world -- the Q supercomputing system (PDF) at Los Alamos National Laboratory -- is only half as speedy as the Earth Simulator.

Supercomputer advocates say confusion over the capabilities offered by different types of high-powered computer configurations is diverting funds from supercomputer development.

"It's not a choose one -- grids, clusters or supercomputers -- question. We need all of them to solve different kinds of problems," said Daniel Reed, director of the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

Reed offered the task of adding a very large set of numbers as a simple example of how different types of jobs require different kinds of problem-solving power.

If you had 100,000 numbers to add, you could give 1,000 numbers each to 100 people, have them add those numbers independently, then add the 100 subtotals to get the result. Except for distributing the numbers initially and then collecting the subtotals to add them all together, no interaction between the "processors" would be required.

Other kinds of problems aren't so easily subdivided. For example, 100 people can't concurrently write a speech; this problem would be better handled by two or three skilled writers who work well together.

Different kinds of computing problems are like this too, Reed said. Some problems can be subdivided readily across a large number of relatively inexpensive processors, connected either locally (clusters) or by wide area networks (grids). Other problems require a more tightly connected set of powerful processors.

"To use a boating analogy, an armada of rowboats is not the same as an aircraft carrier, each serves a different need," Reed said.

At the hearing, members of the National Science Foundation called for a renewed government focus and an increased investment in high-performance computing.

"When we hear that the U.S. may be losing its lead in supercomputing, that the U.S. may be returning to a time when our top scientists didn't have access to the best machines, that our government may have too fragmented a supercomputing policy -- those issues are a red flag that should concern all of us," said Rep. Sherwood Boehlert (R-N.Y.), head of the House Science Committee.

Supercomputing remains a priority for the House Committee on Science, an independent agency of the federal government, according to Peter Freeman, assistant director for computer and information science and engineering at the National Science Foundation.

But an NSF report released in February recommended incorporating supercomputers into computing grids, and stated that without being part of an overall infrastructure, supercomputers don't live up to their full potential.

The Defense Department recently announced that it plans on multimillion-dollar spending on supercomputing to boost its military and intelligence-gathering capabilities.

But Reed wonders if the United States really plans to make a financial commitment to the extent that would be required to build a supercomputer to rival or surpass Japan's Earth Simulator.

"Japan made a sustained, focused effort and investment," Reed said. But he added that the United States has seemingly opted to split its attention between supercomputers, grids and clusters.

It's not that Reed shuns clusters. The National Center for Supercomputing Applications announced Monday that it has purchased a new Intel Xeon-based Linux cluster from Dell Computer that will have a peak performance of 17.7 trillion floating-point operations per second, or teraflops, putting it at No. 3 in the world.

The new cluster will employ more than 1,450 dual-processor Dell PowerEdge 1750 servers running Red Hat Linux.

"This cluster's performance will plant us firmly as the world leader in using off-the-shelf commodity microprocessors to create supercomputers," said Reed. "Deployment of this system will be an important milestone for enhancing scientific discovery via computational science, and it will demonstrate the computing capability that can be provided to the research community through Linux clusters."

Some true supercomputers to rival Earth Simulator are being built now, but most won't be fully up to speed for a few years.

Last November, the Department of Energy awarded IBM a contract to build the two fastest supercomputers in the world, with a combined peak speed of up to 467 teraflops. Together, the two systems will have more processing power than the combined muscle of the 500 current fastest supercomputers.

Willow Christie of IBM media relations said the first system -- called ASCI Purple -- will be the world's first supercomputer capable of up to 100 teraflops.

The second supercomputer, Blue Gene/L, will have a theoretical peak performance of up to 367 teraflops with 130,000 processors running Linux. Both machines will be delivered in 2007.

Phase one of IBM's supercomputer for the National Weather Service went live last month. When fully deployed in 2009, the system will be about four times faster than Earth Simulator, running at a peak speed well in excess of 100 teraflops, Christie said.

It would take one person using a calculator more than 80 million years to tabulate the number of calculations a 100-teraflop supercomputer can handle in a single second.

The weather service machine also is pioneering a new way to deliver supercomputing power. The system is located at IBM's e-business Hosting Center in Gaithersburg, Maryland, and processing power and storage capability is delivered to the National Weather Service using an ultrafast, Internet-based network.

"Leadership shouldn't be judged by one moment in time," said Christie.

But Reed said the government needs to define a plan to develop supercomputers now.

"First, we must deploy systems of larger scale using current designs and technologies to meet the demand for high-end systems in support of science and engineering research as well as national security needs such as cryptanalysis, signals intelligence and weapons design.

"Second, we must launch an integrated, long-term, research-and-development activity to design and build systems better matched to a broad range of application needs.

"The commercial computing market can do part of this, but it can't do it all alone," Reed added. "For example, there is a profitable commercial market for rowboats, sailboats and both inboard and outboard motorboats. But aircraft carriers are built only when governments fund them."