The genetic code is the biochemical basis of life, and given its central importance, there are rules. Double-stranded DNA is transcribed to single-stranded RNA, which is processed through protein-building ribosomes. Each set of three nucleotide bases (a codon) corresponds to a particular amino acid; when a given triptych is being read, the appropriate amino acid swoops in and is added to a growing chain. A protein is born.
Two critical components of this instructional framework are the “start” and “stop” commands – without them, a ribosome wouldn’t know when to begin recruiting amino acids, or which ones to bring in. A one-base shift in the reading frame would result in a completely different protein product, so the instruction manual and the construction team need to be on the same page. AUG (quick refresher: in RNA, U takes the place of T) is the most common start codon, initiating the protein with a methionine amino acid. Three codons, each with its own color-based name*, stop protein synthesis in its tracks and release the chain of amino acids into the cell: UAG (“amber”), UAA (“ochre”), and UGA (“opal”).
The Joint Genome Institute, a Department of Energy consortium located in Walnut Creek, CA, has emerged as a leader in mining the seemingly endless deposits of genetic data coming from sequencing efforts around the world. Researcher Natalia Ivanova was parsing this data when she noticed something strange: several bacteria had really short genes, around 200 nucleotides long, a far cry from the more typical 800-900 nucleotide length she was expecting. Short genes mean short proteins, and in this case, seemingly nonfunctional ones. The only way to make it coherent was if “stop” codons didn’t actually mean “stop”.
Ivanova experimented computationally with various codon reassignments, and ultimately found that things looked a lot more normal if “opal” was translated as a glycine amino acid. In other words, “the same word means different things in different organisms,” says Eddy Rubin, JGI’s Director. The microbial world is multilingual.
Recoding events have been seen before, but the JGI team was able to sift through massive amounts of sequence data to conduct the first thorough search for re-assigned stop codons. And with 5.6 trillion nucleotides from 1776 samples at their fingertips, the researchers cast a wide net. Tanja Woyke, an author on the study and the Microbial Genomics program lead at JGI, presented some of the group’s findings at the American Society of Microbiology conference last week in Boston. “We looked at all kinds of sequence data,” she explains, “and these recoding events are found across the board.” From the human mouth to cave water to marine sites and the cow gut, alternative codon translation tables led to more intelligible results in a range of environments. And it wasn’t just opal that could be modified: ochre and amber reassignments accounted for 24% and 7% of the recoded sequences, respectively. The highest percentage of alternative codon use occurred in a sulfide-rich groundwater sample, where 10.4% of genetic material demonstrated altered “stop” codons.
Recoded stop signs were also found in several bacteriophages, viruses that infect microbes and hijack host machinery to make more viral particles. Given the co-option of microbial hardware, it seems logical that both sets of genetic software would need to be written in the same language, but that doesn’t always seem to be the case. In one case, amber-recoded viruses were found in a setting lacking any amber-recoded microbes, exposing a couple of possible scenarios. Either the microbial community was evolutionarily ahead of the game, or, more intriguingly, recoded viruses can still infect hosts with the standard genetic code.
The genetic code has traditionally been viewed as a universal set of instructions, exquisitely tuned to maintain robust stability and allow evolution-sustaining mutations. But the pervasive occurrence of recoded stop codons, and the backchannel crosstalk between microbes and viruses, paints a more intricate picture of multilingual genetic instructions.
* The first labeled stop codon, UAG, was named for Harris Bernstein, whose last name means “amber” in German. Running with the theme, other teams named subsequent discoveries after colors, UAA as ochre, and UGA as opal. It’s a case of name-based punnery reminiscent of Southern, Northern, and Western blot analyses.