By 2005, the Human Genome Project will have transcribed the entire programming language of human life.
During the past five years a slow collision of epic proportions has united two disparate fields of science. The result promises to be an explosion of new knowledge and power that will sever us from our human heritage and transform us in ways that we cannot yet imagine. If that sounds like overstatement, so be it; this is one occasion where reality should have no trouble matching - or exceeding - journalistic hype.
The slow collision is between computer science and human biology. A new breed of dual-talented researchers has already emerged. They call themselves computational biologists. Their 3-D modeling software and
far-reaching, fuzzy-logic search algorithms are now revealing precisely how genes control our susceptibility to countless diseases ranging from atherosclerosis to breast cancer - and how we can not only cure ourselves but transcend the human condition.
Some time between 2001 and 2005, the Human Genome Project - a global research endeavor coordinated by the National Institute of Health in Bethesda, Maryland - will complete its US$3 billion mission to transcribe the code that controls the creation of human life. At that time, the entire molecular sequence of human DNA will have been reduced to a string
of bits that would occupy about 750 megs on your hard drive (somewhat less with data compression). This code will be freely available. In fact, anyone with
a modem and browser software can download preliminary pieces of it right now from the project's GenBank Web site. (This is a public service; US tax dollars at work. You don't even need to register. See Related Links on page 204.)
And this is still just the beginning. Once the full sequence of human DNA has been disassembled and annotated, we will be able to recompile the resulting code for our own purposes. We will customize ourselves and
our children - and, by extension, their children and their children's children. In this way, we will change the course of evolution itself.
Twenty years ago, when the term gene splicing first entered the human vocabulary, doomsayers wasted no time in denouncing it, while government agencies and ethical study groups were quick to devise guidelines outlawing the creation of "improved" human beings. But Luddites and legislators may ultimately find that genetic information is just as hard to control as any other kind of data. In the long term, people will be able to make their own choices - and humanity will never be the same again.
Life models
A multistory white building with dark-tinted windows stands unobtrusively on the hilly, wooded campus of the University of California at San Diego. Through a spacious lobby decorated with lush computer graphics, down a short hallway, we come to a huge white room where monolithic cabinets are crowded together under bright fluorescent lights and the only sound is the muted roar of cooling fans. A Cray C90 looks like a 6-foot chunk of abstract sculpture beside a glass-fronted case containing the baroque plumbing of its five-stage cooling system. Nearby is a newer Cray, a T3D, and behind that an Intel Paragon, containing 400 separate processors.
This is the main lab of the San Diego Supercomputer Center.
Adjacent to the lab is a long, quiet, dimly lit, comfortably furnished room where grad students sit in front of big color monitors. Michael Gribskov, a quietly amiable, bearded man in his late 30s, hands me some 3-D glasses - the kind of liquid-crystal headset that audiences wear in IMAX theaters. Gribskov calls up a wildly complex image on the screen of a Silicon Graphics workstation. It consists of lines entangled with twisting ribbon. The 3-D effect makes it look as if I could thrust my hand into the center of this abstract jungle.
"This is a protein molecule," Gribskov says, "simplified to reveal its structure. Really, there are thousands of atoms here." He clicks a mouse button, and the image changes to show a vast, dense, sprawling mess
of tiny spheres. Then he drags the mouse, and the molecule rotates. Another click, and we zoom in.
Slightly more than 40 years ago, when James Watson and Francis Crick first deduced the structure of DNA, the only way they could check it was by commissioning craftsmen to fabricate hundreds of plug-and-socket metal parts. The scientists then spent days assembling the parts to prove that the atoms really would fit together the way they were supposed to. Even as late as the mid-1970s, chemists were still building models piece by piece, then making measurements with rulers to cross-check the structure against fuzzy images of real molecules rendered by X-ray crystallography.
Computers have changed all that. "But people still think of these structures as being static," says Gribskov. "In reality they have large-scale motions, and with a supercomputer we can simulate this." Animated by the Paragon in the lab next door, the 3-D image on the SGI shivers like a piece of stiff Jello.
Naturally, there's a serious motive behind these complex renderings. DNA controls the creation of protein molecules. When researchers can actually see how these microscopic building blocks fit together, they can understand their behavior. They can also use the data to predict whether compounds will "mate" successfully. Finally, they can establish a smart online database with which scientists can access that information - exactly what Gribskov and his team are doing.
SDSC's database also enables researchers to sort and com- pare molecular structure in an intelligent way, to make sense of all the data, which is doubling in quantity every 18 months. "When I was an undergrad," Gribskov says, "a pretty complete molecular biology book was a paperback less than 2 inches thick. Now, you couldn't cram it into a shelf of books. The volume of information is so extreme, it's slowing our ability to advance. It's very difficult to be a scientist these days and have an interest in anything else."
Upstairs in his small, modern office on the third floor, Gribskov makes himself comfortable in front of a Digital Alphastation and launches Netscape. A moment later we're at his homepage at The National Biomedical Computation Resource at SDSC. A couple of links from there we reach the data he's been compiling, complete with pictures.
"I'm a practical person," he says, "so I want the things I do to be useful. Over the years, a lot of people's work has not been implemented and has ended up merely as a footnote to history." He shakes his head; clearly, he doesn't want that to happen to him. "You have to develop the product for the end user," he says with a smile.
And what, precisely, is the application?
"Proteins like this," explains Gribskov, nodding toward the image displayed by his browser, "are implicated in many forms of cancer. Once we understand them, this will make a huge difference in medicine. Today, to cure cancer we give you a drug that kills all the cells in your body that are actively dividing. In the future, we'll give you a drug tailored to kill that cancer cell and that one only."
One molecule fits all
Genetics used to be a hopelessly vague science. Everyone knew that traits were passed from one generation to the next, but no one understood how. Scientists theorized that each trait was communicated by a microscopic messenger known as a gene, and that if you put all the genes together, they would form the human genome - the complete genetic code it takes to make a human. But no one knew how the genome worked or what it looked like.
In 1953, it was revealed that deoxyribonucleic acid, a copy of which sits in every cell, functions as that control mechanism. The human genome consists of about 100,000 genes - short sections of DNA that tell a cell how to build proteins, the basic building blocks of life.
Variants of DNA exist in all living things, from bacteria to blue whales. DNA is always built from the same four chemical bases: adenine, thymine, cytosine, and guanine, arrayed in pairs to form a double helix. The precise sequence of base pairs varies not only from one species to the next, but from one individual to the next, affecting everything from hair color to longevity.
There can also be mutations. Some of these create beneficial, evolutionary improvements. More often, though, they interfere with normal functions.
More than 15 million Americans have one or more birth defects, 80 percent of them genetically caused. More than 3,000 diseases - hay fever, hypertension, hepatitis, dermatitis, Down's syndrome, lupus, pancreatitis, meningitis, muscular dystrophy, testicular cancer - will be present or absent depending on the vagaries of a person's genetic code. Two-thirds of serious hearing problems are genetically caused. Serious problems in the eyes and teeth are often genetic. The list goes on.
At first there was no obvious way to fix defects in DNA, since the molecule is far too small to be manipulated with ordinary tools. But some enzymes have the useful ability to cut DNA into pieces at well-defined points, allowing sections to be repaired or modified. This is how gene
splicing became a reality.
In the summer of 1980, UCLA Medical Center announced the first animal gene transplant. Ten years later, doctors at the National Institutes of Health (NIH) completed the first gene therapy on a human being, using DNA that had been extracted, modified, and replaced. Other experiments followed quickly, and by September 1996, the Food and Drug Administration had approved 160 protocols for treating mostly serious conditions such as lung cancer
and leukemia.
More action occurred outside human medicine, where the stakes - and controls - are not so high. In agriculture, the antifreeze gene from an arctic fish was transferred to soybean plants to protect them from low temperatures. Tomatoes that stay firmer longer were patented by Calgene; they are still being evaluated by the FDA.
All of these developments have been made possible by mapping DNA. At first, this was a slow, erratic business. In 1985, a centrally coordinated plan to discover the sequence of the entire human genome was discussed in a conference at the University of California at Santa Cruz. One year later, the Department of Energy expressed interest, following its studies of genetic damage caused by nuclear radiation. A year after that NIH got involved, and the two agencies jointly announced the Human Genome Project, a huge cooperative enterprise that would begin in 1990 and reach its goal in 2005, drawing on contributions from laboratories all over the world.
Many scientists complained that DNA sequencing would be hideously expensive, diverting money from other research that seemed more important. The sequencers won out, though, and the result has been a torrent of data.
The database Michael Gribskov maintains at the San Diego Supercomputer Center is a small venture dedicated to just one family of molecules. A truly vast electronic library is needed to store all data from the Human Genome Project, along with other DNA sequences from thousands of species of animals and plants.
There are now three of these giant repositories: one in Tokyo; one in Cambridge, England; and a third, GenBank, in Bethesda, Maryland, maintained by the National Center for Biotechnology Information (NCBI), sponsored by NIH. The three databases constantly share and trade data so their archives are all complete. But GenBank, the oldest, has done the most to leverage the value
of genetic data by the intelligent application of powerful computers. By mid-1996, it listed 700 million base pairs from 18,000 different species. Storing this information is relatively easy. The hard part is interpreting it.
GenBank
The NIH campus is lushly landscaped. Modern buildings nestle amid low, rolling hills of neatly trimmed grass, and tall trees rustle in the wind. Washington, DC, lies just to the south, but this feels like open country.
In early-morning sunlight I walk past Bethesda Naval Hospital, where American presidents receive their medical care. Farther along a gently curving road is the National Library of Medicine; but my destination is a smaller, glass-walled building on a side road. Here, in a series of tiny cubicles, the NCBI scientists maintain GenBank.
Mark Boguski is a senior investigator, looking hearty and energetic, highly motivated as he rests his lanky frame on a utilitarian chair in the cafeteria at 8:30 a.m. He doesn't fit the stereotype of a cautious, conservative government scientist; he's very informal, very direct.
"A DNA sequence, in isolation, is meaningless," he says. "You need to know where it came from and what it does. The National Library of Medicine has been collecting data since before the Civil War; it started as a handful of books that the Army Surgeon General owned. Today it indexes about 400,000 articles a year, and this literature is what gives sequences their meaning: the database doesn't just store sequences, it links them with all the citations. And that enables users to make new discoveries."
Around us, government employees are eating breakfast. Boguski doesn't seem to notice; he's too enthused by his topic. " Here's an example," he continues. "Someone cloned a gene for Alzheimer's disease and looked for similar sequences here. They found one in a nematode worm. So now they can do experiments on nematodes that have applications back to the human brain! Another example: a scientist wants to know more about a gene in mice. A search for the sequence finds similar proteins in yeast. So now she can do cheap experiments on vats of yeast instead of expensive experiments with mice, and ultimately the work will map back to another similar gene in humans."
This sounds simple - but it isn't. The key word here is similar. A gene may serve the same function in various species, but it won't have precisely the same structure. Evolution is a mess of trial and error. DNA sequences diverge gradually over millions of years, guaranteeing differences.
So how do you know if two genes really are comparable? GenBank uses cunning algorithms to assess similarity, but the ultimate test has to be done in a laboratory. Suppose you believe that a human gene and a yeast gene serve the same function. To prove it, you remove the gene from the yeast's DNA and insert the human gene instead. Then you wait and see if the yeast still grows normally. "This has been successfully done with various genes about 75 times so far," says Boguski.
Biologists are accustomed to this kind of cumbersome, wait-and-see routine, but computer programmers find it maddeningly imprecise. "Their natural tendency is to want things simple," says Boguski. "But nature is incredibly complex. There's a big demand now for computational biologists, and computer people are moving into the field, but we really need people who are biologists first, computer scientists second."
Boguski himself was originally an MD, and he admits that he sometimes misses interacting with patients and being directly involved in their care. "Still, at some point you have to decide where your work is going to have the most impact. What I'm doing now has much more of a positive influence than I could ever have with individual patients. DNA sequences are the future of medicine - and this, here, is the medical library of the 21st century."
Linking links
The principal architect of that library is Jim Ostell, who directed the basic design of GenBank's database in 1990. Ostell is lanky, long-haired, and bearded, with a nerdy but affable manner. He majored in zoology as an undergrad, then turned away from academia and took a job in 1972 gathering live organisms for a biological supply house. "I roamed meadows and ponds collecting amphibians, plants, and invertebrates in western Massachusetts," he recalls. He later went back to school for a master's and wondered how vastly complex organisms could grow from such simple beginnings. In 1979, he began working in the Harvard Bio Labs, where early attempts at DNA sequencing were under way.
The lab's only computer was a CP/M machine, comparable to a RadioShack TRS-80, which was used by a secretary to do wordprocessing during the day. The available programs for manipulating chemistry data were frustratingly primitive. Ostell wrote one of his own, started giving it away, and after he finished his PhD, he left the lab to market his software. For a while he lived in a Vermont farmhouse, leading a serene life composing computer code.
Eventually, though, he feared he was losing touch with developments in molecular biology. When NCBI offered him a job in 1988, he accepted. Fortuitously, when he found himself receiving offers from pharmaceutical corporations that would double or triple his government salary, NIH offered him a special title and position, Senior Biomedical Research Scientist, which Ostell shared with just 11 other people. It exempted him from federal pay scales, although he says he'd still do better in the private sector.
He prefers GenBank, though, because it allows him to deal with the full breadth of his subject. In his office, Ostell draws three circles on a whiteboard to indicate three areas of data. One consists of DNA sequences, another the protein sequences that genes create. The third contains genetic and other related literature from NIH's Medline database, the world's most comprehensive online medical research tool. GenBank has it all. No matter where you start, everything is linked. Even old literature is linked forward to new literature, and naturally the system displays the most promising hits first.
"We computed the comparison of all known proteins to all other known proteins," Ostell says. "We did the same thing with comparisons of English text; in each paper we assigned weighting to words depending how often they repeated. We've now done this for the 1.2 million published citations in the genetic subset of Medline, and we're working on the remaining 8 million outside genetics." Thus, a newly identified DNA sequence can quickly be compared with others that may be better understood. Links can also be identified to medical conditions that result when a particular sequence is missing or damaged. Finally, there may be therapies that can be adapted and applied.
Initially, there was some backlash from researchers against a central government agency devising its own software, file structure, and user interface that everyone else would have to put up with. At one time NCBI was even accused of being "Stalinistic."
Fortunately, the man chosen to head it is a maverick biologist named David Lipman whose former ambitions included real-world occupations such as fiction writing and filmmaking. Lipman made it clear he didn't give a damn about dress codes or organizational protocol. He assembled a young, multinational workforce, and he infected them with a cheerfully arrogant, can-do attitude. As a result, GenBank behaves unlike any normal government agency.
Lipman was determined that the data should be accessible by anyone, anywhere, so Ostell's team developed user-friendly software for accessing GenBank, in versions for Unix, DOS, Macintosh, and even VMS systems. The flexible approach paid off: GenBank now processes 40,000 inquiries per day, not including casual visitors to its Web site. It measures its success by the number of people who use it - more like a market-driven business than a bureaucracy.
GenBank's system is based on Sun SPARCcenter 2000s, while four hulking Power Challenger XLs handle the similarity comparisons in sequence searches. Three full-time staffers run a help desk responding to users by email, phone, or fax. By modem, GenBank receives newly deciphered DNA sequences from hundreds of sources all over the world. Often, different laboratories will be studying the same gene; except that it won't be precisely the same, because of the slight genetic variation between the individuals sampled. As a result, GenBank holds multiple versions of many sequences, no two of them exactly alike. When the Human Genome Project completes its task, the final sequence will be selected from thousands of fragments, causing it to be a composite, like a morphed photo mixing multiple ethnic features.
So far, less than 5 percent of the total human genome has been sequenced. "Biology today is still like physics before Newton," says Ostell. "We are only just starting to see meaningful patterns in the data."
But once we understand DNA, he says, "it will make the electronics revolution look like a flash in the pan." He smiles thoughtfully. "I would guess that a lot of species die around this stage, because they acquire sufficient power to kill themselves."
One could argue that the nuclear physicists got us there first. And yet, Ostell's helping to accelerate the process.
He shrugs. "Nature produces more terrifying creatures than we can imagine. Humans are now the most widely available prey species on the planet, and the most likely predator is a virus. We need to understand the predators in order to defend ourselves."
But is it wise to make the information so publicly available?
"I think it's important that a clever grad student in Smallville, USA, should have equal access. Our best defense against knowledge being used in a narrow, dangerous way is to allow a broad number of people to participate. I have more faith in humanity as a mass than in any one institution."
Who owns DNA?
There's no guarantee, though, that everyday citizens will make wise use of genetic knowledge when it affects their personal lives. One case history illustrates the point.
Orchemenos is a small village in Greece where many people happen to have a gene that causes sickle-shaped red blood cells. On its own, this gene is harmless - many people of African descent also have it, and it may even provide immunity to malaria. But if two parents both carry the gene, their child is likely to die from sickle-cell anemia.
Clearly, couples should know if they have the gene before they think about having children. A group of researchers tested the villagers at Orchemenos, assuming that carriers would behave rationally and would pair with noncarriers in order to mix the genes safely and protect the community's children. The noncarriers, however, refused to cooperate. Even though the gene was harmless on its own, carriers became stigmatized and noncarriers refused to marry them. In the end, the carriers became a shunned subclass who were forced to marry among themselves, making the situation even worse than before.
Health workers today are extremely cautious about communicating genetic information to parents. Arthur Caplan, director of the Center for Bioethics at the University of Pennsylvania, tells the story of a pregnant woman at a large medical center who was informed by a genetics counselor that her fetus had an extra Y chromosome that might increase the chance of the child displaying aggressive or criminal behavior. Caplan was not particularly happy about the outcome: "After talking about the situation with their family doctor and various friends, the couple decided to abort the pregnancy."
Fetal testing is not yet a standard procedure, but demand for it seems sure to grow as we gain the capability to detect a wider range of abnormalities - and correct many of them before a baby is born. How will parents evaluate this information? If a fetus has a gene that offers a 50 percent chance of muscular dystrophy developing later in life, will the woman carrying it opt for abortion? If treatment for the fetus is possible, but she rejects it and the baby is born with a defect, will she be considered negligent? Does that mean the law will protect fetuses from parental abuse? How will this affect the law on abortion?
Other problems affect only adults, especially in the workplace. "What do you do," asks Michael Gribskov, "if you run a chemical plant and discover that an employee has an increased risk of getting cancer by coming into contact with chemicals? Do you fire this person? What if his risks are merely doubled, and he wants to stay on the job?"
Such dilemmas seem likely to trigger disputes and litigation that will make other issues raised by technology in the workplace seem trivial by comparison.
But there's more: genetic testing can also be used to discriminate among job applicants. If someone has a heightened risk of obesity, alcoholism, diabetes, or Alzheimer's - and if this can be discovered by quick analysis of a single hair - won't an employer feel tempted to run this test, covertly if necessary? Genetic discrimination eventually may be prohibited by law, but it isn't yet.
Some progress has been made toward the goal of protecting an individual's right to his or her genetic data. Patricia Roche is an expert on public health law at the Boston University School of Public Health. She codrafted a document titled "The Genetic Privacy Act" because, she says, "our primary concern was to protect the privacy of the individual."
A revised version of the text was introduced as a bill in the Senate at the end of June. Roche is uncertain whether this will be voted into law, but 19 states so far have enacted their own genetic privacy or nondiscrimination legislation.
The most fundamental ethical question, though, is far harder to solve. Even if we satisfy the need for privacy and informed consent, are there some genetic procedures that are so inherently unethical they should not be tolerated under any circumstances?
There's been no shortage of committees, task forces, study groups, and agencies wanting to discuss or control genetic engineering. NIH has created the Recombinant DNA Advisory Committee to review proposals for gene therapy. The National Center for Human Genome Research has established
a department for discussing ethical, legal, and social implications. Unesco has set up an International Bioethics Committee - and in March 1996 it released a Declaration on Protection of the Human Genome urging that all genetic research should be government regulated, since "the human genome
is the common heritage of humanity."
This is a startling concept. In effect, it classifies DNA as a kind of public property. You have the right to control your particular version of it, but you may not have the right to change it if the modifications will be inherited by future generations - because this will be altering the overall human gene pool, which belongs to everyone.
Darryl Macer served on the Unesco bioethics committee and is founder of the Eubios Ethics Institute, based in New Zealand and Japan. "It would certainly be in the spirit of the Universal Declaration of Human Rights, Article 27," he says, "to interpret the DNA sequence as something of shared ownership."
Macer hopes the Unesco genome declaration will be adopted by the United Nations General Assembly in 1998. If the United States signs this resolution, Americans could find themselves in the bizarre position of being forbidden by international treaty from making certain kinds of alterations to the seed from which they were created.
The draft Unesco resolution doesn't rule out somatic therapy, which alters the DNA only in mature cells. Germline therapy is the no-no, since it changes DNA in sperm or ova, and those changes will be passed on to every subsequent generation. Unesco ethicists will grudgingly tolerate this, if it is used only to correct bona fide birth defects. But they adamantly rule out germline therapy to "enhance" future children.
Only one problem: how can we define the difference between a correction and an enhancement?
Customized kids
The situation would be simpler if there were some kind of normal baseline for human beings, with defects lying below this level and enhancements above it. But life is not simple. Children with a genetic tendency to be obese are likely to die earlier than average; does this mean obesity is a disease that can be corrected in the germ cells, or would that be a form of enhancement? Poor eyesight may be a liability if a child wants to be an airline pilot; should this trait be erased from the germline? Should shortness or tallness be considered an abnormality that can be corrected?
Obviously there is a huge gray area between "enhanced" and "corrected" traits, and if germline therapy is allowed, there'll be a temptation to extend it, just as plastic surgery is now often used to enhance breast size instead of merely correcting damage caused by mastectomies.
Fear of this slippery slope has frightened the Council of Europe into proposing a ban on all germline therapy in its Draft Bioethics Convention. Likewise, some scientists have sworn a solemn promise never to interfere with germ plasm - sperm or ova - under any circumstances.
Ethicist Caplan is not impressed. He says it's an easy promise for scientists to make, because "none of them believe that anyone is even remotely close to knowing how to alter the germlines of a human being, much less whether germline engineering will actually work." The pledge, he says, is "an expedient way to silence critics."
Caplan feels that germline therapy could be extremely valuable in some cases. At the very least, he says, "some genetic diseases are so miserable and awful that at least some genetic interventions with the germline seem obligatory."
Mark Rothstein, director of the University of Houston Health Law Institute, stated the case bluntly to a local newspaper. "You could argue that it is inefficient to do somatic therapy," he said. "Why not go in there and fix things once and for all?"
Currently, most parents feel they have only two options: allow a fetus to come to term, or abort the pregnancy. Presumably, mothers who rule out abortion on religious grounds would also oppose fetal enhancement, since they would see it as interference with "God's will." But pro-choice groups routinely assert that the fetus is not a person and is part of the mother. By this logic, the mother should be free to change the fetus in any way she wishes - leading to potentially bizarre consequences.
"You see how crazy people get, breeding dogs," comments Jim Ostell. "Does the parent have the right to impose that kind of craziness on children? If you can decide that your son should be 8 feet tall, should you be allowed to make that decision? Frank Zappa had the right to name his daughter Moon Unit, but would he also have the right to give her three eyes?"
Bearing in mind that the debate over abortion has sparked one of the ugliest unresolved conflicts in recent American history, it's not hard to imagine the kind of outcry these capabilities would create.
Sequence yourself
Experiments in germline therapy are unlikely within the next decade, and even somatic therapy has been severely limited so far, simply because it costs so much. Most experts agree, though, that the cost of sequencing DNA will continue to diminish, enabling the Human Genome Project to wrap up ahead of schedule and under budget. Will we ever reach a point where it's so cheap to sequence DNA that a consumer can have her genome scanned and stored on a CD-ROM?
This is hard to answer, because the technology is so young. Predicting DNA sequencing costs today is like trying to predict the future of computers back in the 1950s.
Jane Peterson is program director for genomic sequencing at The National Center for Human Genome Research, which allocates $119 million a year for labs involved in the Human Genome Project. Peterson administers grants for large-scale mapping and sequencing. "When the project started in 1990," she says, "the cost was about $10 per base pair." This was clearly prohibitive
- there are 3 billion base pairs in the complete human genome.
Thus, one of Peterson's highest priorities has been cost reduction. "It's been coming down," she says, "through better technology and efficiencies of scale, and is now around 50 cents or less. We've recently allocated
$42 million for six pilot projects to develop new strategies and technologies that should bring the cost down even more."
Some people remain skeptical. "The ABI multicolor sequencer is the most highly efficient one available today," says Chris Hogue, a fluorescence specialist who now develops 3-D rendering techniques for GenBank. "But it was used right from the start of the project and has not been improved by any new technology that would really bring costs down. Also, it's hard to eliminate labor costs that result from preparing the sample and then reading the output. The raw output has to be rationalized. And scientists need to write commentary. Overall, I think the realistic minimum is 10 cents per base pair using foreseeable technology."
Could atomic-force microscopes "feel" the atoms in a string of DNA?
"This has been tried," says Hogue, "and at least one mistake was published where scientists were looking only at the substrate that the DNA was stuck to. It's really not a simple problem."
Ten cents per base pair may sound modest, but with only 5 percent finished so far, it would still place the cost of the complete sequence at nearly $300 million. Mark Boguski is more optimistic: "When the Genome Project ends in 2003 or 2004," he says, "we should be paying a fraction of a cent per base."
If that turns out to be true, sequencing might be affordable for some individuals 10 or 15 years later. But why would anyone want a copy of his or her own DNA data?
The answer is that DNA is more than just data - it's a program. Once we can read it and interpret it, we should be able to do what anyone would normally do with a program: load it into a computer and run it.
Virtual clones
"Computer models are going to become increasingly complex," says Michael Gribskov, back in his office at the San Diego Supercomputer Center. "I'm not just talking about models of molecules, but models describing all the interactions within a cell."
Gribskov admits that this is a huge challenge, because so many cell functions are not yet understood. "Still," he says, "we may not need a complete model of the cell to simulate the processes." In other words, we should be able to model the relevant processes of a cell without tracking individual molecules, just as we can describe and predict weather patterns without tracking individual raindrops.
When this becomes possible, primary research will no longer require lengthy, expensive lab tests. Instead, experiments will be played out in computer memory, according to rules that describe the ways in which chemicals react with each other. Moreover, using parallel processing, we should be able to follow many of these processes running simultaneously. This would make it feasible to simulate multiple cells as they divide and interact to create a living organism.
Cellular automata programs already create complex and beautiful patterns using elementary math to control the behavior of millions of pixels. Likewise, says Gribskov, computers should be able to track the behavior of millions of cells. At this point it will be feasible to depict, for instance, all the life processes of a nematode - one of the simplest living things.
The implications extend even further. What's to stop us from scaling up from a nematode to a human fetus? In fact, if we have enough computing power to track all the cell reactions, we should be able to grow a virtual person, an information entity - call it an infomorph - inside a computer.
There are trillions of cells in an adult human, but here again simplification should be possible, especially since we would be primarily interested in the brain rather than the body. We certainly won't need to know the position and function of every cell to replicate the overall look, feel, and behavior of a human being on the macro scale.
At this point science turns into science fiction. Have you ever wished you could have a long-lost relative restored to life? If you have a strand of hair or some fingernail clippings, a simulation is theoretically possible. In the physical world, cloning a person would cause legal and ethical complications; but how can anyone object if you merely use your loved one's data to "grow" a simulated version inside a computer? This would be a whole lot safer, less controversial, and perhaps less expensive than a real-world cloning operation of the type described in Jurassic Park.
Perhaps you'd prefer an infomorph of Albert Einstein on your hard drive for those idle moments when you crave a little intellectual stimulation. Einstein's brain has been preserved, but we don't need to dissect and study it - we just need a sample of its DNA in order to grow a perfect virtual replica, with all the same attributes the scientist enjoyed in real life. Of course, an Einstein infomorph will still need to be painstakingly nurtured and educated, mimicking the maturation process of a real human being. But there will be a powerful financial motive for doing this, because as soon as it reaches maturity, the infomorph will be a highly marketable product that can be copied and distributed or rented out on a time-sharing basis. In fact, infomorphs could be the killer app of the 21st century. And remember, we're not talking about cheap, unconvincing AI agents that are merely programmed to imitate famous people. We're talking about a copy grown from the original source.
From the infomorph's point of view, the situation might seem a bit grim, being forced to dwell in computer memory and converse with the future equivalent of AOL users. Still, some simulated audiovisual inputs should alleviate sensory deprivation, and the infomorph might be radio-linked with robot rovers if it wants the vicarious pleasure of exploring the physical world.
How much computing power will be necessary to achieve this? Right now, we don't even know what we need to simulate a single cell, because information about molecular processes is incomplete. But we know much more about the macro scale. In fact, there have been decent estimates of the power needed to simulate human thought on a real-time basis.
Ralph Merkle, a computer scientist at Xerox PARC, published a paper in 1989 evaluating intellectual processing power. He measured it in three different ways.
Method One: There are about 1 quadrillion synapses in the brain. They process about 10 nerve impulses per second. Therefore the brain carries out about 10 quadrillion synapse operations per second.
Method Two: The human retina (which has its own processing power and is relatively well understood) contains about 100 million nerve cells performing about 10 billion addition operations per second. The brain is bigger than the retina by a factor of somewhere between 100 and 10,000. Therefore the brain must process between 1 and 100 trillion operations per second.
Method Three: The human brain consumes about 25 watts of energy, of which about 10 watts are used directly for mental processes. We know the power consumption of a single synapse and can estimate the average distance between synapses. This means we can figure the maximum number of synapse operations that can be supported by the brain's "power supply." The upper limit turns out to be 2 quadrillion synapse operations per second.
Averaging out these estimates, it looks as if the brain may run at around 1 quadrillion synapse operations per second. How does this compare with computers? We currently have massively parallel equipment that is capable of almost 1 trillion floating-point operations per second. If computing power continues doubling every 18 months, hardware should catch up with brainpower sometime around 2020. At that point infomorphs become theoretically possible.
One element still needs to be considered, though: memory. Human memory seems enormous, since it stores so many sights, sounds, and concepts. How many bytes will we need to replicate it?
Merkle has examined this question in another paper, published in 1994, which tackled the general problem of making an accurate copy of a human brain. First, he asked how much physical detail is important. Do we need to know the position of every brain atom? No. Every molecule? Probably not. The contents of each synapse? Maybe. Research indicates that one bit of memory information is actually stored across thousands or even millions of synapses, but let's play it safe and store the states of all of them.
According to Merkle, if each synapse is described by one byte and there are 1 quadrillion synapses, then we'll need a memory of 1 quadrillion bytes. A terabyte is 1,000 gigabytes, and you can already buy this much memory in the form of optical drives for under $100,000. Clearly it will be no problem for our infomorphs to memorize everything that happens to them in their virtual world. And unlike us, they need not get forgetful as they grow older. In fact, they may never need to die at all.
Again, this might seem quite a stretch, starting from the small premise that DNA is a set of instructions like a computer program. But the chain of logic is unbroken. A human being is a massive system made of cells that interact. When all the functions and interactions of DNA are understood, they can be reduced to a set of rules that determine the way cells behave. If a computer is large enough and powerful enough, why shouldn't it simulate the result of all these rules running simultaneously?
Some people may feel tempted to store and run simulations of themselves, thus achieving an ersatz immortality. Picture Joe User in 2070, sending a tiny skin sample by FedEx to a mail-order sequencing lab. A couple of days later he receives his complete DNA sequence via email. Now all he needs is a computer and the necessary software to grow a copy of himself, much as people today keep tropical fish.
Joe's too busy to be a full-time parent, so he hires an online nanny service that maintains its own infomorphs to educate young virtual clones. This is a huge advantage, since the info-entities can communicate at their own accelerated speed, shortening childhood to a matter of months. Within
a year, Joe Jr. is up and running as an adult. Now Joe Sr. can preserve all his adult memories by copying them into his virtual self. Extracting synapse states from his brain would be impossible by any method currently known, short of a gruesome and destructive freeze-and-peel operation. But nanotechnology, for instance, might make it work, and as a result, Joe's life knowledge and experience would be permanently preserved. His children's children would be able to talk to Joe Jr. anytime, long after Joe Sr.'s physical self has died.
But why should Joe Sr. remain a hostage to his mortal physical body? Why not copy his entire self into the virtual world, and say adios to meatspace?
This is so speculative, it's hard to view it seriously. Still, a deadly serious subtext remains: when we can describe the fundamental seed of a human being entirely as computer data, our whole conception of the human condition starts to change. When bits accurately represent DNA, those bits literally become a life-form of their own, and physical biology looks increasingly primitive by comparison.
Free DNA
The specter of genetic modification has spawned endless government reports, blue-ribbon position papers, and international resolutions. Doomsayers such as Jeremy Rifkin are already carving out a steady income denouncing the whole field of modern genetics.
Yet all the edicts and opinions may ultimately be irrelevant if the demand for DNA sequencing and gene therapy causes costs to diminish to the point where techniques are accessible to individual consumers. At that time, even scientists and doctors are liable to find themselves out of the loop.
Diagnosis will no longer entail physical examinations, expensive tests using large pieces of hardware, or the physician's traditional educated guess based on the look and feel of a patient. Drugs will be precisely targeted, avoiding dangerous side effects. Surgery may still be needed for some conditions, but much less often.
Smart sensors driven by expert systems will diagnose abnormalities. Automated treatment systems will be mass-produced as consumer products, in much the same way that blood-pressure sensors have become an off-the-shelf item. Genetic drugs will be highly specific and therefore extremely safe, allowing you to modify your DNA in the privacy of your own home. And if the FDA doesn't approve, Americans will likely turn to foreign sources of supply, just as AIDS patients do today.
Thus, the system of government controls and conventional medical ethics will be loosened, and individuals will be able to start making their own decisions. Some may choose to do nothing at all, which is plausible considering recent findings that show a majority of Americans believe any tampering with DNA to be against the will of God. But if parents have a chance to protect their children from a crippling defect, how many will just say no to genetic therapy? If they can make a child stronger or healthier or more intelligent, won't they feel tempted by this, too? And when genetic data is as cheaply and easily available as computer data, won't we all start to look at life in a more pragmatic, less reverential way?
Perhaps greater personal power and freedom will ultimately inspire a new set of ethics to guide individual behavior. Alternatively, we may see companies calmly and coldly manipulating genetic data in order to market new life-forms, from customized pets to infomorphs.
Either way, whatever people choose to do, it will be very difficult to stop them - because it's becoming clear now that *human beings consist of information.*And, as everyone knows, information just wants to be free.